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ABSTRACT 


This  thesis  presents  several  interactive  computer  programs  for  the  analysis  of 
multivariate  data.  A  special  case  is  that  of  panel  data;  multiple  time  series  of  short 
length.  The  first  program,  BOXPLOTAB,  handles  this  type  of  multivariate  data;  it  is 
an  enhancement  on  an  existing  graphical  technique  for  exploratory  data  analysis 
known  as  BOXPLOTS.  The  program  works  by  appending  boxplots  as  column  dividers 
in  a  table  of  the  raw  data  which  originates  the  box  plots.  This  combination  of  the  raw 
data  and  the  graphical  representation  of  that  data  improves  the  understanding  of  the 
characteristics  of  the  data  in  exploratory  and  descriptive  applications;  differencing  and 
tracing  of  data  through  the  table  is  also  implemented.  This  thesis  also  presents  and 
explores  the  use  of  other  graphical  techniques  for  exploratory  data  analysis  of 
multivariate  data  such  as  STAR  plots,  PROFILE  plots,  CODED  SCATTER  plots  and 
CODED  DRAFTSMAN  plots.  These  techniques  are  examined  and  implemented  in  a 
series  of  computer  programs  which  produces  these  graphical  displays.  A  technical 
description  of  each  computer  program  is  presented  and  user  implementation  procedures 
are  discussed.  The  programs  are  implemented  in  APL  and  run  in  conjunction  with  the 
experimental  IBM  APL  Graphics  program  GRAFSTAT.  To  demonstrate  the  use  of 
these  techniques,  an  analysis  is  conducted  on  several  sets  of  multivariate  data. 
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within  the  time  available,  to  ensure  that  the  programs  are  free  of  computational  and 
logic  errors,  they  cannot  be  considered  validated.  Any  application  of  these  programs 
without  additional  verification  is  at  the  risk  of  the  user. 
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I.  INTRODUCTION 


A.  PREFACE 

One  of  the  main  problems  in  experimental  statistics  and  experimental  design  is 
the  exploratory  analysis  of  raw  data.  This  problem  is  greatly  enlarged  when  the  data 
presented  to  the  statistician  comes  from  unknown  multiple  populations  or  so  called 
multivariate  data.  A  special  case  is  that  of  panel  data;  multiple  time  series  of  short 
length.  The  initial  purpose  of  the  data  analysis  is  to  try  to  capture  the  most  important 
distributional  characteristics  of  the  data  such  as  the  range,  location  and  spread  of  the 
data  points.  For  the  experimental  statistician  the  main  tool  available  for  the  analysis  of 
the  data  is  the  graphical  display  of  the  marginal  distributions  of  the  data  in  order  to 
visualize  and  gain  better  understanding  of  these  characteristics  and  to  compare  them 
against  those  of  the  different  populations.  Following  this,  interactions  or  dependencies 
can  be  examined,  and  this  is  the  domain  of  multivariate  data  analysis. 

B.  PURPOSE 

The  purpose  of  this  thesis  is  twofold:  first,  to  add  to  the  tabular  display  of  the 
original  multivariate  data  an  existing  graphical  technique  known  as  the  BOX  PLOT  (see 
[Ref.  1)  ).  This  addition  can  be  done  in  several  alternating  ways  and  is  done  in  order  to 
better  understand  the  populations  and  the  relations  between  the  different  populations. 
The  second  purpose  of  the  thesis  is  to  make  available  different  computer  programs  to 
exploit  several  other  enhanced  statistical  graphical  techniques  for  multivariate  data 
analysis.  These  techniques  are:  STAR  plots,  PROFILE  plots,  CODED  SCATTER 
plots  and  CODED  DRAFTSMAN  plots. 

C.  BACKGROUND 

Presently,  the  BOXPLOT  technique  is  one  of  the  most  common  graphical 
techniques  used  by  data  analysts,  both  outside  and  at  the  Naval  Postgraduated  School 
(NPS).  There  arc  different  software  packages  that  provide  these  plots,  some  of  which 
arc  in  the  experimental  IBM  APE  GRAFSTAT  program  and  some  in  the  IBM 
Mainframe  NON! MSI.  library. 

One  of  the  most  important  limitations  of  this  graphical  display  technique  is  the 
absence  of  the  raw  data  in  the  dFplac;  this  absence  is  critical  in  the  special  case  of 
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multiple  box  plots  for  the  comparison  of  multiple  sets  of  data.  This  would  provide  an 
inmediate  identification  of  peculiar  characteristics  of  the  data,  such  as  pin-pointing  the 
outliers  and  the  variability  and/or  relation  of  a  sample  datum  with  respect  to  other 
samples.  Once  the  BOXPLOTS  arc  displayed  on  the  screen  (or  printed  on  a  graph),  the 
analyst  has  to  go  back  to  the  original  raw  data  in  order  to  identify  these  data  points. 
This  limitation  is  overcome  by  the  new  technique  presented  herein,  which  is  called  the 
BOXPLOTTED  TABLES.  This  technique  has  already  been  implemented,  and  can  be 
used  in  the  NPS  IBM  3033  computer  using  an  APL  (A  Programming  Language  ) 
program,  which  make  use  of  the  graphical  capabilities  of  the  IBM  experimental 
GRAFSTAT  software  package.  In  GRAFSTAT,  an  interactive  technique  for 
identifying  odd  or  outlying  points  is  given.  This  implementation  highlights  the 
importance  of  a  technique  for  data  point  identification.  However,  one  does  not  always 
have  access  to  such  a  program  and  the  ability  to  do  this  identification  on  a  printed 
page  is  important  to  a  data  analyst.  The  BOXPLOTTED  TABLES  do  precisely  this. 
Note  too  that  a  primary  concern  in  multivariate  data  analysis  is  to  get  as  much 
information  on  a  two  dimensional  page  as  possible.  Thus  having  tabular  and 
distributional  data  together  on  one  page,  as  in  the  BOXPLOTTED  TABLES,  is  a  step 
in  this  direction. 

There  are  other  graphical  techniques  commonly  used  by  data  analysts  such  as  : 
SCATTER  plots,  STAR  plots,  PROFILE  plots,  CODED  SCATTER  plots  and 
CODED  DRAFTSMAN  plots  (see  [Ref.  1J  ).  These  techniques  are  mainly  used  to 
enhance  the  interpretation  and  understanding  of  displayed  multivariatcd  data.  Out  of 
these,  the  DRAFTSMAN  and  SCATTER  plot  techniques  (without  coded  symbols)  are 
the  only  ones  available  at  NTS  up  to  this  point.  These  other  techniques  arc  used  to 
display  the  data  points  in  many  different  forms,  giving  a  new  perspective  to  the 
interpretation  of  the  original  data. 

This  thesis  present  a  group  of  APL  functions  that  will  make  possible  the  use  of 
these  graphical  techniques  to  the  experimental  statistician  at  the  NPS.  Various 
examples  that  show  how  to  use  this  software  to  analyze  and  graphically  display  sample 
data  will  be  shown  in  the  following  chapters  of  this  thesis. 

D.  ORGANIZATION 

This  thesis  consists  of  three  main  blocks.  The  first  one,  Chapter  Two,  is  dedicated 
to  explain  the  technical  aspects  of  these  graphical  techniques;  the  mathematical  and 
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statistical  attributes  of  each  technique  are  treated  in  this  chapter.  The  second  block  is 
intended  to  introduce  the  user  to  the  APL  srr:\vare  code  used  to  implement  these 
techniques.  In  Chapter  Three,  both  the  user  and  system  requirements  are  explained; 
and  for  the  more  technical  oriented  reader,  a  listing  of  the  APL  code  is  listed  in 
Appendix  A.  In  addition,  several  examples  of  program  execution  arc  listed  in 
Appendix  B.  The  last  block,  composed  of  Chapter  Four  and  Appendix  C,  is  dedicated 
to  the  exploratory  analysis  of  several  sets  of  sample  data  to  demonstrate  some  of  the 
potential  applications  of  these  graphical  approaches  to  statistical  analysis. 


II.  GRAPHICAL  TECHNIQUES 


A.  BOXPLOTTED  TABLES 

1.  Overview 

The  BOXPLOT  graphical  technique  was  first  conceived  by  Tukcy  as  a  method 
to  display  an  almost  one-dimensional  summary'  of  the  distribution  characteristics  of  a 
set  of  data,  Chambers  [Ref.  1]  provides  an  excellent  analysis  of  this  technique.  This 
display  shows  some  of  the  most  prominent  characteristics  of  the  sample  distribution 
such  as  the  median,  mean,  the  inter-quartile  range  and  the  outliers,  if  there  arc  any.  In 
the  case  where  the  sample  comes  from  multivariate  data,  the  BOXPLOT  is  used  not 
only  to  show  the  individual  characteristics  of  each  subsamplc,  but.  in  addition,  to 
compare  the  behavior  of  these  characteristics  with  respect  to  other  samples  ( see 
[Ref.  1:  p.  89]  ).  Figure  2.1  shows  a  BOXPLOT  display.  The  BOXPLOT's  almost 
one-dimensional  character,  as  opposed  to  the  two-dimensional  character  of  the  familiar 
histogram,  facilitates  comparison  of  the  marginal  properties  of  multivariate  data  sets. 

One  of  the  limitations  of  the  BOXPLOT  is  that  of  the  identification  of  specific 
values  of  interest  such  as  outliers;  if  the  identification  of  the  outliers  is  the  prominent 
feature,  the  statistician  must  make  reference  to  the  original  data  in  order  to  identify 
which  data  point  the  outlier  correspond  to. 

A  solution  to  this  limitation,  suggested  by  Professor  P.A.W.  Lewis  in  an 
unpublished  work,  is  to  show  the  original  data  and  the  BOXPLOT  in  the  same 
graphical  tabular  display.  In  this  case,  the  BOXPLOTS  are  shown  as  dividers  of  the 
original  tabulated  data,  so  that  aberrations  arc  readily  apparent  and  checkable  (see 
Figure  2.2).  This  technique  clearly  requires  the  availability  of  high  resolution  graphics 
and  a  sophisticated  plotting  and  data  manipulation  package.  This  requirement  is  met 
by  the  experimental  APL  GRAFSTAT  program  from  IBM  Research  which  is  being 
used  at  the  NPS  on  a  test  bed  basis. 

2.  Technical  details  of  BOXPLOTTED  tables 

In  the  BOXPLOT  the  top  and  bottom  of  the  rectangle  represent  the  upper 
and  lower  quartile  of  the  data  respectively.  Therefore,  the  length  of  the  rectangle 
represent  the  inter-quartile  range  (  Q(.75)  -  Q(.25)  =  IQR  ),  where  Q(a),  for  0<a< 1 
is  the  a-quantile  of  the  sample.  The  mean  of  the  data  sample  is  shown  by  a  small 


Figure  2.1  BOXPLOT  of  California. Hospital  Data  (Per  Capita, 
Hospital  Expenses,  Years  1971-1975,  in  14  Health  Service  Areas). 
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Figure  2.2  BOXPLOTTFD  'fable  of  California  Hospital  Data  (Per  Capita 
Hospital  Fxpenscs,  Years  1971-1975,  in  14  Health  Service  Areas). 
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circle,  and  the  median  by  an  asterisk  inside  the  rectangle-  The  solid  lines  going  out 
from  the  top  and  bottom  of  the  rectangle  represent  the  adjacent  values.  These  values 
are  defined  as  those  data  points  greater  or  equal  than  Q(.75)  and  less  or  equal  than 
Q(.75)  +  IQR  for  the  upper  line,  and  those  values  less  or  equal  Q(.25)  and  greater  or 
equal  than  Q(.25)  -  IQR  for  the  bottom  line.  Those  data  points  that  fall  in  the  range  of 
KQ025)  -  IQR)  ,  (Q(.25)  -  IQR*  1.5)]  or  [(Q(.75)  +  IQR)  ,  (Q(.75)  +  IQR*  1.5)]  arc 
called  outliers  and  are  represented  by  small  light  circles.  The  data  points  that  fall 
beyond  the  ranges  of  these  outliers  are  called  extreme  outliers  and  arc  represented  by 
small  black  circles.  As  an  example,  for  normally  distributed  data,  approximately  5 
percent  of  the  points  should  be  outliers  and  marked  with  light  circles  and  only  about 
0.5  percent  should  be  extreme  outliers  (see  (Ref.  2]  ). 

To  obtain  a  BOXPLOTTLD  table,  a  tabular  display  of  the  data  is  added  to 
the  BOXPLOT  display.  At  the  bottom  of  each  column  the  estimates  for  the  mean, 
median,  variance  and  the  rank  correlation  between  that  column  and  the  next  column  to 
the  right  are  listed.  The  estimators  for  these  parameters  are  defined  as  follows  : 

Let  X-  be  the  entry'  in  the  i^  row  and  column,  and  let  n  be  the  number  of 
values  in  each  column.  Then 

Meanj  =  Xj  =  Xjj  /  n.  (2.1) 

The  Median  is  defined  as  follows.  Let  MIDj  be  n/2  if  n  is  even,  and  the  largest 
integer  smaller  than  n/2  if  n  is  odd.  Then 

Medianj  =  X  j(MIDj),  if  n  is  odd,  or  (2.2) 


Median^  =  X*j(MIDj)  +  X*j(MIDj+  1)  /  2,  (2.3) 

*  th 

if  n  is  even,  where  X  j  represents  the  j  column  sorted  in  descending  order.  The 
estimator  for  the  variance  is 


Variance-  =  (  X-  -  X:  )2  /  (  n  -  1). 


(2.4) 


Finally,  the  Spearman's  p  (RHO)  Rank  Correlation  coefficient  is  defined  as 
follows:  let  X-  and  Y-  be  two  sets  of  data,  and  let  (R(Xj)]  and  [R(Y|)]  be  the  ranks  of 

X-  and  Y-  as  compared  to  the  others  X  and  Y  values  respectively,  for  i=  1,2,. ..,n. 

R(Xj)  =  1  if  X-  is  the  smallest  of  X^Xj.—.X^  and  R(X’)  =  2  if  X-  is  the  second 
smallest,  and  so  on,  with  rank  n  being  assigned  to  the  largest  of  the  X-.  The  same 
applies  for  R(Yj).  When  assigning  the  ranks,  if  a  tie  is  found  (when  two  or  more 

sample  values  are  exactly  equal  to  each  other,  they  are  tied),  assign  to  each  tied  value 

the  average  of  the  ranks  that  would  have  been  assigned  if  there  had  been  no  ties  (see 
[Ref.  3:  p.  252]  ).  Then  the  estimator  is 

Pj  =  XiIRCXi)  -  (n+  l)/2][R(Yi)  -  (n+  l)/2]  /  [(n(n2  -  1))/12],  (2.5) 

if  there  are  no  tics  in  the  data.  If  there  are  ties  in  the  data,  then  the  estimator  is 

Pj  = _ IiR(Xi)R(Y1)  -  n((n  +  1 )  2)2 _  (2.6) 

tliRCXj)2  -  n((n  +  1 )/ 2 )2] 1 '' 2  [£ i R( )2  -  n((n  +  l)/2)2]l/2  * 

Once  the  BOXPLOTTFD  tables  for  the  original  data  are  obtained,  it  is  then 
possible  to  join  with  lines,  values  with  the  same  rank  (order  in  magnitude)  within  their 
corresponding  columns.  The  statistician  may  select  to  use  this  technique  when  it  is 
desirable  to  study  any  possible  relation  with  respect  to  time  among  variables  (as  in  the 
case  of  multiple  short  time  series),  or  with  respect  to  magnitudes  (as  in  the  case  of  data 
with  mixed  qualitative  and  quantitative  information). 

If  the  data  arc  ordered  (in  descending  order  on  the  first  column),  then  the 
outlier  in  the  first  boxplot  corresponds  to  the  first  value  in  the  table.  However,  this 
ordering  may  be  lost  in  the  second  column,  so  that  it  is  not  clear  that  an  outlier  in  the 
second  boxplot  corresponds  to  the  first  value  in  the  second  column  and  so  forth.  Thus 
if  a  line  is  drawn  linking  the  largest  value  in  each  column,  two  extreme  results  are 
informative.  If  the  line  is  straight  (or  almost  straight),  it  means  that  the  outliers  in 
successive  columns  come  from  the  same  source.  If  the  line  wanders,  then  there  is  no 
structural  relationship  along  columns  (or  time,  if  one  considers  panel  data).  As  an 
example,  the  study  of  health  care  expenses  is  a  good  prototype  of  multiple  short  time 
scries  analysis. 
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Figure  2.3  BOXPLOTTED  Tabic  with  Joining  Lines 
(Per  Capita,  California  Hospital  Expenses,  Years  f971-19 


Figure  2.4  BOXPI  OTTHI)  Table  of  Relative  DiUercnccs,  f  irst  col. 
(Per  Capita,  California  Hospital  Expenses,  Years  1971-1975). 


17 


V 

i 


In  Figure  2.2  one  could  draw  a  line  joining  the  highest  or  lowest  expenses 
through  time  and  sec  to  which  hospitals  they  belong.  The  reader  could  now  make 
reference  to  Figure  2.3.  In  this  figure  one  could  see  the  transition  of  health  care  cost 
for  the  most  and  least  expensive  health  service  areas  in  California  through  the  period 
of  1971  to  1975.  A  plot  with  the  connections  is  shown  in  Figure  4.2. 

In  addition  to  this  option,  the  user  can  display  the  differences  between  the 
values  of  the  columns.  These  differences  could  be  relative  to  the  first  column  (base 
column)  or  with  respect  to  the  previous  one.  When  the  statistician  is  dealing  with  panel 
data,  it  is  desirable  to  study  the  trend  of  relative  (or  absolute)  rate  of  change  in  the 
data  points.  Again,  in  the  analysis  of  health  care  expenses,  one  may  want  to  study  the 
relative  (or  absolute)  rate  of  change  in  this  cost  through  a  given  period.  In  Figure  2.4, 
it  is  possible  to  infer  that  the  relative  change  of  health  care  expenses  is  not  linear 
within  the  period  of  study.  This  inference  would  be  enhanced  by  a  plot  of  differences, 
as  is  done  in  Chapter  Four.  It  is  also  possible  to  readily  identify  those  health  services 
areas  that  had  the  extremes  rates  of  change.  An  analysis  of  this  data  is  presented  in 
Chapter  Four. 

B.  STAR  PLOTS 
1.  Overview 

In  working  with  multivariate  data,  one  of  the  key  problems  is  how  to 
represent  more  than  two  variables  (dimensions)  in  a  single  display.  There  arc  several 
graphical  approaches  to  deal  with  this  problem,  as  mentioned  in  Chambers  [Ref.  Ij. 
Four  of  these  techniques  arc  treated  in  this  thesis  :  STAR  plots,  PROFILE  plots, 
CODED  SCATTER  plots  and  CODED  DRAFTSMAN  plots. 

In  the  STAR  plot  each  subpopulation  is  displayed  by  a  star  in  which  each  arc 
(or  ray)  represents  a  variable  of  interest.  The  value  of  each  variable  is  coded  by  the 
length  of  the  corresponding  arc;  to  avoid  overlapping  between  arcs,  these  are  portrayed 
symmetrically  about  the  origin.  This  can  be  seen  in  Figure  2.5  and  Figure  2.6,  in  which 
some  characteristics  of  automobile  data  are  displayed  (a  complete  description  of  the 
data  is  presented  in  Chapter  Four).  In  Figure  2.5,  twelve  variables  of  interest  are 
assigned  to  the  rays  of  the  star  (i.e.,  price,  length,  etc.).  In  Figure  2.6,  the  same 
representation  is  used  to  portray  the  same  information  but  for  several  automobile 
subpopulations  (models).  It  is  now  easy  to  graphically  compare  these  characteristics 
(by  the  corresponding  length  of  each  ray)  among  the  different  subpopulations  (models). 
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Figure  2.5  Assignment  of  Variables 
to  the  Rays  of  the  S'l  AR  (Automobile  Data). 
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2.  Technical  details  of  STAR  plots 

There  are  two  essential  features  in  the  construction  of  the  STAR  plot;  the 
lengths  of  the  rays  and  the  angle  between  the  rays.  As  stated  in  the  last  section,  the 
value  of  the  variables  are  represented  by  the  length  of  the  rays;  therefore,  these  values 
should  be  non-negative  and  be  represented  using  a  similar  scale.  This  is  accomplished 
by  initially  rescaling  the  value  of  the  variables  using  the  following  formula  : 

X  jj  =  [(  1-c  )(  Xjj  -  minj  )  /  (  MaXj  -  minj  )]  +  c  ,  (2.7) 

where  c  is  a  constant  and  is  usually  given  a  value  of  zero;  and  Xj:  represents  the  i^1 
observation  of  the  j  n  variable.  The  coefficients  min:  and  Max:  represent  the  minimum 

1 1  J  J 

and  maximum  value  of  the  jin  variable  respectively.  Once  the  rescaling  of  the  variables 
is  performed,  the  angle  between  the  rays  must  be  determined.  The  first  variable 
(variables  are  enumerated  in  increasing  order)  is  plotted  on  the  horizontal  axis  at  an 
angle  of  zero  degrees.  Then  the  jtn  angle  between  the  remaining  variables  is  calculated 
using  the  following  formula  : 

ti)j  =  2ti  ( j  -  1  )  /  n  ,  (2.8) 

where  n  represents  the  number  of  variables  (parameters),  and  j  is  the  j1*1  variable.  The 
rays  are  then  enumerated  from  2  to  n  and  displayed  counterclockwise.  Finally,  the  star 
is  constructed  by  joining  the  end  points  of  the  n  rays.  The  end  point  of  each  ray  is 
calculated  by  the  following  formula  : 

Pjj  =  (X  -  R  cos  0)  ,  X  -  R.  sin  to  ),  (2.9) 

where  j  =  1,2,...,  n^  variable,  and  R  is  the  maximum  allowable  radius  of  the  star. 

C.  PROFILE  PLOTS 
1.  Overview 

The  PROFILE  technique  is  similar  in  nature  to  the  STAR  plot,  the  only 
difference  is  that  in  the  PROFILE  plot  the  rays  arc  displayed  by  equidistant  vertical 
lines  arising  from  a  common  horizontal  axis.  In  fact,  as  stated  in  Chambers  [Ref.  1:  p. 
159),  the  STAR  plots  arc  actually  PROFILE  plots  conceived  in  polar  coordinates.  In 
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the  PROFILE  plots,  the  values  of  the  variables  are  used  to  control  the  length  of  the 
ends  of  the  connected  line  segments  (see  Figure  2.7  and  Figure  2.8). 

One  of  the  possible  advantages  of  the  PROFILE  plots  over  the  STAR  plots  is 
that  in  the  former  it  is  possible  to  represent  variables  with  negative  values.  Since  in  the 
PROFILE  plots  all  value- vectors  are  displayed  with  respect  to  a  horizontal  axis,  it  is 
then  easy  to  show  variables  with  negative  values.  The  base  line,  the  horizontal  axis,  is 
used  to  represent  zero  and  negative  values  are  displayed  by  lines  dipping  below  this 
line.  In  the  STAR  plots  this  is  not  possible. 

2.  Technical  details  of  PROFILE  plots 

In  the  PROFILE  plot  the  rescaling  is  performed  using  the  same  formula  as  for 
the  STAR  plot.  Negative  values  of  the  variables  are  allowed  by  using  the  following 
rescaling  formula: 

X*ij  =  (  Xy  /  MaXj  ).  (2.10) 

D.  CODED  SCATTER  PLOTS 

1.  Overview 

The  CODED  SCATTER  plot  is  an  enhancement  of  the  most  commonly  used 
technique  namely  a  SCATTER  plot  for  two  variables.  Using  this  coding  technique  it  is 
possible  to  represent  more  than  two  variable  (dimensions)  in  the  same  display. 
Different  symbols,  and  sizes  and  colors  of  these  symbols,  are  used  to  represent  three  or 
higher  dimensional  data.  The  size  and  color  of  the  symbols  could  be  used  to  control 
different  ranges  of  data  values. 

2.  Technical  details  of  CODED  SCATTER  plot 

The  CODED  SCATTER  plot  uses  essentially  the  same  plotting  technique  as 
the  usual  SCATTER  plot;  only  coded  symbols,  sizes  and  colors  are  added.  This  is  in 
line  with  the  need  to  represent  as  many  dimensions  as  possible  from  a  multivariate  data 
set  on  a  two  dimensional  graph.  Thus,  in  a  CODED  SCATTER  plot  the  position  of 
the  points  in  the  graphical  plane  are  represented  by  the  (X.Y)  values  of  the  two 
variables.  Then,  if  X  is  the  miles  per  gallon  variable  in  a  data  set  and  Y  is  the  price  of 
the  car,  plotting  Y  vs  X  shows  how  gas  consumption  increases  or  decreases  with 
increasing  cost  of  a  car,  or  that  there  is  a  much  more  complex  relationship  between  the 
two  variables,  or  that  there  is  even  no  relationship  at  ail.  However,  there  arc  other 
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variables  or  factors  involved  in  the  relationship.  These  may  be  either  continuous, 
discrete  or  categorical  factors.  An  example  of  the  first  is  the  weight  of  the  car,  an 
example  of  the  second  is  the  number  of  cylinders  in  the  car.  A  categorical  factor  is 
origin,  i.e.  whether  the  car  is  domestically  produced  or  not. 


Figure  2.9  CODFD  SCATTER  Plot  of  the  Automobile  Data. 

(Price  vs  M.P.G.  City). 

Figure  2.9  shows  a  CODFD  SCATTER  PLOT  of  the  car  price  variable,  X, 
versus  the  miles  per  gallon  variable,  Y.  The  best  way  to  code  the  origin  of  the  car  is  by 
using  colors;  however,  due  to  reproduction  problem,  this  has  been  encoded  as  symbol 
type.  The  weight  of  the  car  is  coded  as  the  size  of  the  symbol.  In  Figure  2.9,  one  can 
see  that  increasing  price  gives  lower  m.p.g.,  although  the  relationship  is  far  from  linear. 
The  other  factor  is  weight;  weight  clearly  increases  with  price,  also  m.p.g  decreases  with 
weight.  Again,  referring  to  the  categorical  variable,  origin,  American  cars  cost  more 
than  foreign  cars,  weigh  more  and  get  less  mileage,  although  there  is  interaction  and 
overlap  between  all  of  these  variables.  There  arc  also  a  few  outliers.  The  very  high 
mileage,  low  cost,  and  light  weight  car  is  the  VW  Rabbit  (Diesel)  and  the  very  heavy, 
low  mileage,  and  high  cost  car  is  the  Cadillac  Seville. 
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E.  CODED  DRAFTSMAN  PLOTS 


1.  Overview 

The  DRAFTSMAN  plot  is  an  arrangement  of  SCATTER  plots  in  which  any 
adjacent  pair  of  plots  have  a  common  axis  (see  [Ref.  1:  p.  145]  ).  In  this  way,  the 
practitioner  can  observe  the  relationships  of  the  variables  within  a  specific  plot  and,  in 
addition,  can  fell  -v  any  particular  observation  (or  group  of  observation)  through  the 
sequence  of  plots.  Therefore,  the  analysis  of  multiple  interactions  among  the  variables 
is  possible.  This  DRAFTSMAN  plot  can  further  be  enhanced  by  portraying  one  or 
several  additional  variables  by  the  assignment  of  symbols,  sizes  of  the  symbols,  and 
colors  to  the  already  displayed  variables.  This  is  the  main  idea  behind  the  CODED 
DRAFTSMAN  plot,  in  which  the  techniques  used  in  both  DRAFTSMAN  and 
CODEiD  SCATTER  plots  arc  combined  to  display  a  single  plot. 

2.  Technical  details  of  CODED  DRAFTSMAN  plot 

The  CODED  DRAFTSMAN  plot,  as  mentioned  earlier,  can  be  seen  as  a 
displayed  array  of  several  CODED  SCATTER  plots.  In  certain  situations  the 
SCATTER  plots  may  be  deceiving  due  to  the  overlapping  of  data  points.  This  situation 
may  require  to  jitter  one  or  more  variables  in  order  to  alleviate  this  problem.  Also,  the 
practitioner  may  want  to  transform  the  data  in  order  to  achieve  a  simple  and  more 
understandable  picture  and  in  this  way  facilitate  the  analysis  of  the  relationships 
among  the  variables  (see  [Ref.  4]  ).  Another  technique  used  by  statisticians  to  reduce 
the  spread  in  the  data,  and  to  enhance  the  visual  interpretation  of  the  plots  is  to 
smooth  the  data  ,  relying  on  the  Moving  Averages  technique  or  the  Locally  Weighted 
Regression  (LOIVLSS)  fc:  this  purpose.  E'or  further  explanation  about  these  two 
techniques  sec  Moran  [Ref.  5|.  These  techniques  (jitter ,  transformation,  and  smoothing), 
are  included  in  the  CODED  DRAFTSMAN  program  presented  in  this  thesis,  and  they 
can  be  invoked  interactively. 


III.  COMPUTER  PROGRAMS  :  USER  INSTRUCTIONS  AND 
TECHNICAL  DESCRIPTION 


A.  GENERAL 

This  chapter  provides  detailed  instruction  on  how  to  use  the  computer  programs 
presented  in  this  thesis.  These  programs  were  written  in  APL  and  arc  designed  to  be 
used  in  conjunction  with  the  experimental  IBM  graphical  software  GRAFSTAT.  All 
these  programs  arc  interactive  and  all  user  defined  parameters  and  options  selections 
are  entered  in  response  to  program  queries.  Although  no  APL  skills  are  required  to 
operate  these  programs,  it  is  recommended  that  the  user  becomes  familiar  with  APL 
system  commands  and  procedures  to  load  and  copy  workspaces,  groups  and  variables, 
and  to  understand  the  meaning  of  workspace,  variables,  groups  and  vectors  in  the  APL 
terminology.  The  user  should  read  VS  APL  AT  NPS  ,  [Ref.  6]  before  attempting  to  use 
these  programs.  For  the  experienced  APL  user  it  will  be  easy  to  make  changes  to  these 
programs  in  order  to  accommodate  any  additional  requirement. 

These  programs  were  designed  to  be  used  on  the  IBM  3033  computer  and  to  be 
executed  using  an  IBM  3277  TFK  61S,  327S/3279  or  3179G2  graphic  display  terminals 
using  a  memory  capacity  of  at  least  2  Megabytes. 

All  of  these  programs  are  contained  within  an  APL  workspace  called 
APLGRAFS,  and  arc  organized  in  different  groups  of  functions  (each  group  contain 
those  functions  related  to  a  specific  program  application).  The  reader  can  find  a  list  of 
Groups  and  the  content  of  each  group  of  functions  in  Appendix  A2.  In  order  to  make 
use  of  these  programs,  the  user  must  have  access  to  this  workspace  and  to  the  APL 
workspace  called  GRAFSTAT. 

There  arc  two  ways  of  executing  these  programs.  The  first  one  is  described  in  the 
following  steps  : 

1 )  LOGON  to  the  system. 

2)  Once  in  CMS,  enter  APLGST. 

3)  At  the  prompt  CLEAR  \VS,  type  )LOAD  GRAFSTAT. 

4)  Enter  )  PC.OPY  APLGRAFS  groupname,  where  groupname  is  one  of  the  croups 
listed  in  Appendix  A2. 

5)  Enter  the  name  of  the  desired  program  to  be  executed  (i.c.,  ST  ARP  LOT  )  and 
then  answer  the  queries. 

The  second  mode  is  more  user-friendly.  The  steps  that  must  be  followed  are  : 

I)  LOGON  to  the  system. 


2)  Once  in  CMS,  tvpe  APLGRAFS,  (this  will  cause  the  execution  of  the  macro 
APLGRAFS  EXEC),  then  you  will  see  a  menu  describing  all  the  available 
programs  (see  Figure  3.1  ). 

3)  After  you  enter  the  number  corresponding  to  the  selected  program,  you  onlv 
have  to  follow  the  instructions  given  on  the  screen  (this  macro  will  execute 
steps  2),  3),  and  4)  of  the  previous  list  for  you). 


1 

FILE  j  MENU  NENU  Al 

- 

VOU  HAVE  THE  FOLLOWING  PROGRAMS  TO  USE 

i 

Cl)  STAR  AND  PROFILE  PLOTS 

(2)  BOX  PLOTTED  TABLES 

(J)  SYMBOLIC  SCATTER  PLOTS 

(«)  CRAFTSMAN  DISPLAY 

(5 1  LOWESS 

(6)  EXPLANATION  ON  THESE  FUNCTIONS 

(7>  QUIT 

1 

TYPE  THE  NUMBER  CORRESPONDING  TO  THE  PROGRAM  YOU  WANT 

t-  -  _  _  ______ 

Figure  3.1  Menu  Presented  by  APLGRAFS  EXEC. 


B.  PROGRAM  DESCRIPTION 

In  order  to  use  any  of  these  programs,  the  user  will  need  a  matrix  containing  the 

sample  data.  This  matrix  could  be  in  a  CMS  File,  or  could  be  a  character  array  in  an 

APL  workspace.  The  programs  will  accept  the  data  in  cither  way;  just  follow  the 

instructions  given  by  the  program  as  to  the  location  of  the  data  set.  In  addition,  the 

user  will  need  an  APL  two-dimensional  character  array  containing  the  names  of  the 
variables  which  will  appear  in  the  display.  These  names  arc  the  labels  which  will  be 
shown  on  the  axis  of  the  plots  or  in  the  rows  and  columns  of  the  tables  as  in  Figure 
2.2  and  2.4.  If  the  user  has  not  previously  created  this  array,  the  programs  will  allow 
the  user  to  enter  the  labels  directly  in  response  to  a  sequential  series  of  queries.  At  this 
point,  the  user  is  ready  to  execute  any  of  the  programs. 

When  answering  the  queries,  if  the  user  enter  an  erroneous  response,  the  program 
will  prompt  the  user  to  enter  the  correct  response  only  in  the  case  of  a  YES  or  NO 
question,  a  range  question  (i.c.,  3,4,  or  5  plots),  if  the  name  of  any  APL  matrix  does 
not  exist  in  the  workspace,  etc.;  in  all  other  cases,  the  program  docs  not  have  any 
means  to  know  the  validity  of  the  response  so  the  program  will  accept  any  response  as 


a  correct  one.  When  using  these  programs,  if  the  user  wants  to  cancel  the  execution  at 
any  time  during  the  execution,  the  user  must  hit  the  PA2  key. 

1.  BOXPLOTTED  TABLES 

This  program  is  executed  by  entering  the  command  BOXPLOTAB.  Once  this 
command  is  entered,  the  program  will  start  running  by  prompting  the  user  with  a 
sequence  of  queries  indicating  the  user  to  enter  the  input  arrays  and  to  select  the 
different  available  options  (see  Appendix  Bl,  for  an  example  of  program  execution). 

a.  Input  requirement. 

(1)  The  arrav  containing  the  sample  data,  the  array  containing  the  names  (labels) 
of  the  columns  ,  ana  the  array  containing  the  names  (labels)  of  the  rows. 

(2)  The  title  of  the  display. 

b.  Options. 

(1)  The  data  could  be  displayed  as  originally  entered  or  could  be  displayed 
ordered  (sorted)  by  the  first  column. 

(2)  Once  the  BOXPLOTTED  Tables  are  shown  on  the  screen  the  user  will  be 
prompted  as  to  whether  or  not  he  or  she  wants  to  join  the  data  points  of  the 
same  position  with  lines.  The  position  is  given  by  the  order  of  the  data  points 
ol  the  first  column  (see  figure  2.3). 

(3)  After  finishing  with  the  previous  displav.  the  user  will  be  prompted  whether  or 
not  he  or  she  wants  to  see  BOXPLOTTED  TABLES  of  the  differences 
between  columns;  and  if  it  is  so,  whether  absolute  or  relative  dilfcrcnccs  are 
desired.  'I  he  difference  could  be  calculated  as  follows  :  difference  between  all 
other  columns  with  respect  to  the  first  one,  or  difference  between  adjacent 
columns  (sec  Figure  2.4). 

2.  STAR  PLOTS  and  PROFILE  PLOTS 

These  two  programs  are  executed  by  entering  the  command  STARPLOT.  The 
program  will  start  running  and  the  user  will  be  asked  to  enter  the  desired  function  :  (S) 
for  STAR  PLOT  or  (P)  for  PROFILE  PLOT  (sec  Appendix  B2,  and  B3  for  an  example 
of  the  execution  of  this  program). 

a.  Input  Requirements. 

(1)  Same  as  for  the  BOXPLOTTED  TABLES. 

b.  Options. 

(1)  Whether  the  whole  original  data  is  to  be  used  or  just  a  subsamplc  of  the  data. 

'I  he  subsamplc  could  be  constructed  bv  selecting  specific  columns  and  or 
rows. 

(2)  The  user  will  be  asked  how  manv  plots  per  screen  are  desired.  This  could  be 
3,4  or  5  plots  per  screen. 

3.  CODED  SCATTER  PLOT 

The  execution  and  the  input  requirements  of  this  program  are  similar  to  that  • 
of  the  STAR  PLOT.  To  execute  the  program  enter  the  command  SC  ATP  LOT  (see 
Appendix  B4  for  an  example  of  the  execution  of  the  program). 
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a.  Options. 

(1)  The  user  will  be  asked  to  enter  the  title  for  the  screen  and  the  title  for  each 
plot  on  the  screen. 

(2)  For  each  plot,  the  user  must  enter  the  column  to  be  used  on  the  X  and  Y  axis, 
and  whether  or  not  the  entire  data  or  a  subsample  of  it  is  desired. 

(3)  Another  option  is  to  select  whether  the  data  is  to  be  jittered  or  if  a 
transformation  of  the  data  (specified  by  the  user)  is  desired. 

(4)  Following  this  option,  the  user  must  select  the  position  of  each  plot  on  the 
screen  (1,  21,  22,  ...,  etc.). 

(5)  Finally,  the  user  must  specify  the  symbols  colors  and  sizes  of  these  symbols 
that  will  be  used  to  represent'an  specific  subset  of  the  data.  . If  the  user  selected 
one  plot  per  screen,  the  program  will  ask  for  a  small  description  for  each  one 
of  these  subsets  or  categories.  These  subsets  are  defined  using  API. 
statements.  The  user  can  specify  more  than  one  subset  in  the  same  plot  and 
more  than  one  plot  per  screen  (sec  Figure  2.9). 

4.  CODED  DRAFTSMAN  PLOTS 

This  program  is  executed  by  entering  the  command  DRAFTSMAN.  Once  this 
command  is  entered,  the  program  will  start  by  prompting  the  user  with  a  sequence  of 
queries  indicating  the  user  to  enter  the  input  arrays  and  to  select  the  different  available 
options  (sec  Appendix  135,  for  an  example  of  program  execution). 

a.  Input  requirement. 

(1)  The  array  containing  the  sample  data,  and  the  array  containing  the  names 
(labels)  of  the  columns. 

b.  Options. 

(1)  The  data  could  be  used  as  originally  entered,  or  could  be  jittered  or 
transformed.  The  user  must  select  the  desired  option. 

(2)  Select  whether  or  not  a  smoothed  curve  will  be  fitted  to  the  data  in  all  plots 
on  the  screen.  If  the  smoothed  curse  is  selected,  the  user  must  indicate 
whether  the  Moving  Average  or  LOW  ESS  technique  will  be  used. 

(3)  Select  between  using  the  CODED  DRAFTSMAN  or  the  regular 
DRAFTSMAN  plot.  fT  the  former  is  selected,  the  user  must  enter  an  APE 
expression,  a  svmbol  and  its  size,  and  the  color  for  each  category  to  be 
represented. 

(4)  Select  the  number  of  plots  desired  per  screen  (the  available  options  are  3,4  or 
5  rows  and  columns  of  plots  per  screen). 

(5)  Once  the  display  is  shown  on  the  screcn.and  if  the  answer  to  option  (2)  was 
no,  the  user  now  has  the  alternative  of  fitting  a  smooth  curve  to  the  data  of 
an  specific  plot. 


IV.  DATA  ANALYSIS 


A.  GENERAL 

The  primary  purpose  of  this  chapter  is  to  demonstrate  the  applications  of  the  six 
graphical  techniques  presented  in  this  thesis,  namely  BOXPLOTTED  tables,  STAR 
plots,  PROFILE  plots,  CODED  SCATTER  plots  and  CODED  DRAFTSMAN  plots 
in  the  analysis  of  multivariate  data.  An  attempt  is  made  to  highlight  different 
peculiarities  on  the  sample  data  that  could  be  found  when  the  practitioner  uses  these 
techniques;  therefore,  a  full  analysis  of  the  various  samples  is  not  envisioned.  However, 
it  will  be  seen,  that  in  using  this  techniques  one  can  draw  solid  conclusions  about 
certain  behavior  of  the  characteristics  of  the  population  from  which  the  sample  is 
drawn. 

B.  AN  ANALYSIS  OF  HEALTH  CARE  EXPENSES 

The  following  type  of  data  represent  a  good  example  for  which  the  statistician 
can  make  use  of  the  BOXPLOTTED  tables  and  the  PROFILE  plots.  This  is  a  sample 
of  panel  data  and  represents  the  health  care  cost  (per  capita  hospital  expenses)  of  14 
health  service  areas  through  the  State  of  California  from  the  years  of  1971  to  1975. 
Figure  4.1  is  a  BOXPLOTTED  table  which  displays  the  average  health  care  expenses 
of  the  areas.  — 

The  data  was  formatted  as  a  two  dimensional  array  of  14  rows  and  5  columns. 
Each  row  of  the  array  represents  the  average  health  care  expenses  of  a  given  area,  and 
each  column  corresponds  to  the  average  expense  for  a  given  year.  The  data  have  been 
ordered  in  decreasing  order  by  the  first  column  (year),  i.e.,  the  service  area  with  higher 
health  care  expenses  on  the  first  year  correspond  to  the  first  row  and  so  on.  In  general, 
the  boxplots  in  Figure  4.1  give  an  initial  impression  of  the  distribution  of  each 
subSamplc  data.  Notice  that  during  the  first  three  years  the  tendency  of  the  distribution 
is  definitely  skewed  to  the  right,  caused  by  some  possible  outliers,  indicating  that  some 
service  areas  far  exceed  the  average  health  care  expenses.  However,  in  the  last  two 
years  the  tendency  is  the  opposite,  with  again  the  exception  of  some  possible  outliers. 

The  initial  impression  given  by  the  boxplots  could  further  be  exploited  by  an 
analysis  of  the  flow  of  the  data  in  order  to  study  the  trend  of  the  mean  health  care 
costs,  the  variance  of  health  care  cost,  and  to  identify  the  occurrence,  or  recurrence,  of 


possible  outliers.  Another  possible  trend  that  could  be  study  is  that  of  the  relative 
difference  in  health  care  expenses.  One  further  area  of  interest  is  to  study  the  trend  of 
the  most  and  less  expensives  service  areas  through  this  period. 

The  tendency  in  the  average  health  care  cost  during  this  period  was  expected  to 
be  an  increasing  one  (during  this  time,  among  other  things,  the  inflation  rate  was 
starting  to  increase  very  rapidly).  This  tendency  can  be  seen  in  Figure  4.1.  Notice  that 
this  change  is  apparently  quite  linear  up  to  the  year  of  1974;  in  1975  there  is  a  big 
jump  in  the  average  cost  which  probably  indicates  that,  overall,  the  trend  in  the 
average  health  expense  through  this  period  was  not  linear.  This  same  tendency  is 
present  in  the  variance  of  health  care  cost,  which  seems  to  confirm  the  nonlinearity  in 
the  average  health  care  expense  during  this  period.  In  this  figure  and  in  Figure  4.2, 
where  lines  are  used  to  trace  the  How  of  the  1971  high  and  low  cost  areas  through 
subsequent  years,  it  is  also  possible  to  readily  pinpoint  those  service  areas  of  extreme 
average  health  care  cost  (possible  outliers,  as  defined  in  Chapter  II).  Notice  that,  the 
health  service  area  number  4  is  shown  as  a  possible  outlier  through  all  years;  it  is 
always  at  least  2. 2<r  from  the  mean  cost.  The  service  area  number  3  has  the  same 
tendency.  These  two  areas  are  then  the  possible  cause  in  the  high  variation  observed  in 
the  health  care  cost  through  this  period.  They  actually  represent  the  Los  Angeles  and 
San  Francisco  metropolitan  areas.  In  Figure  4.2,  one  could  also  follow  those  service 
areas  with  lower  average  cost  (these  areas  are  joined  by  line  segments  at  the  bottom  of 
the  display),  it  scents  that  these  areas  (number  I  and  14)  had  the  lowest  cost  through 
this  period,  with  the  exception  of  service  area  no,  8  which  has  the  lowest  cost  in  1971. 

One  could  further  follow  the  trend  in  the  change  of  health  care  expense  for  each 
respective  service  area  by  using  the  PROFILE  plots.  In  Figure  4.4  each  one  of  the 
profile  plots  portrays  the  values  of  each  row  (health  care  service  area),  and  the  values 
are  ordered  by  the  magnitude  of  the  first  column,  as  in  the  BOXPLOTTL'D  tables.  The 
values  of  each  column  are  represented  in  each  profile  according  to  the  assignment 
given  in  Figure  4.3.  In  Figure  4.4  it  is  quite  easy  to  identify  the  health  service  areas 
that  had  the  highest  and  lower  health  care  expenses  during  this  period;  as  it  was  seen  in 
the  BOXPLO (TED  tables,  these  areas  arc  number  4  and  3,  and  number  1  and  14 
respectively.  One  could  also  rcadly  pinpoint  the  area  with  more  variability  in  health 
care  expenses,  in  this  case  notice  that  area  number  13  has  greater  change  in  health 
expenditure  than  areas  number  1,  14  and  4  (this  last  being  the  most  expensive).  Notice 
that  the  highest  variation  in  expenditure  in  area  number  13  takes  place  during  1972  and 
1973.  This  fact  could  also  be  capture  in  Figure  4.5  in  columns  2  and  3. 


Figure  4.3  Assignment  of  Variables  to  the  Profile  of  the 
Per  Capita  Health  Care  Expenses. 


igure  4.4  Profile  Plot  of  the  Per  Capita  Health  Care  Expenses 
California  Health  Service  Areas. 
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Now  one  could  also  reinforce  the  statement  about  the  nonlinearity  in  the  change 
of  the  health  expenditure  by  looking  at  the  relative  difference  in  this  variable  through 
this  period.  Figure  4.5  portrays  the  trend  in  the  relative  difference  in  health  expenditure 
with  respect  to  the  First  year  of  study  (1971).  In  Figure  4.6  one  could  see  the  some 
trend  but  now  with  respect  to  the  previous  year.  Figure  4.6  definitely  shows  that  the 
change  in  health  care  expenses  has  a  nonlinear  behavior.  It  is  changing  linearly  during 
the  First  three  years,  and  then  at  an  accelerated,  possible  quadratic  rate,  from  then  on. 
The  same  trend  seems  to  be  shown  in  Figure  4.5,  this  trend  is  highlighted  in  the  last 
column,  where  the  mean  of  the  relative  differences  jump  from  48.91  to  76.50. 

In  Figures  4.5  and  4.6  it  is  also  possible  to  identify  those  services  areas  that  have 
the  maximum  and  minimum  relative  change.  As  an  example  from  1971  to  1972  area 
number  8  had  the  maximum  positive  increase  and  from  1972  to  1973  the  area  with  the 
maximum  positive  change  was  area  number  4. 

C.  AN  ANALYSIS  OF  THE  NEW  YORK  STOCK  EXCHANGE 

In  the  previous  analysis,  the  data  considered  consisted  of  the  same  type  of 
commensurable  values;  i.e.,  dollars  through  a  period  of  time  (one  could  consider  this  as 
being  multiple  short  time  scries  data).  In  contrast  with  this  type  of  data,  the 
practitioner  can  encounter  multivariate  data  that  represent  different  qualitative  and 
quantitative  magnitudes.  One  example  of  this  type  is  the  data  obtained  from  the  stock 
markets_in  the  United  States.  Here  again,  the  practitioner  can  make  use  of  the 
BOXPLOTTED  tables  as  a  tool  for  data  analysis.  The  data  to  be  analyzed  was 
extracted  from  the  New  York  Times,  representing  the  most  active  stocks  (measured  by 
the  number  of  shares  traded)  in  the  New  York  Stock  Exchange  for  the  week  ended  on 
August  8,  1986.  The  data  is  initially  formatted  as  a  two  dimensional  array  consisting  of 
40  rows  (representing  each  of  the  different  trading  companies)  and  6  columns.  Each 
column  correspond  to  the  following  variables. 

(1)  Volume  of  shares  traded  during  the  week  (in  100,000  units). 

(2)  Closing  price  at  the  end  of  the  week  (in  dollars). 

(3)  Price  change  during  the  week  (in  percentage). 

(4)  Price  change  during  the  last  12  months  (in  percentage). 

(5)  Earnings  per  share  during  tile  last  12  months  (in  dollars). 

(6)  Earnings  per  share  during  the  last  12  months  (in  percentage). 


Figure  4.7  shows  the  initial  distributional  characteristics  (in  the  form  of  boxplots) 
of  each  subsample  data.  One  of  the  first  visual  messages  from  these  plots  are  the 
outliers  in  each  column.  Here  is  where  the  power  of  this  new  graphical  technique  lies: 
one  can  easily  identify  those  possible  outliers  by  looking  at  the  tabular  data  adjacent  to 
the  boxplots;  although  this  is  easier  in  the  first  column  since  the  data  is  ordered  in  that 
column.  As  an  example,  looking  at  the  first  boxplot  and  the  first  column,  it  is  easy  to 
identify  Owen  Corning  and  the  Mobil  Corp.  as  those  companies  traded  by  these  two 
companies  is  greater  that  2.7<r  of  the  average  column  traded. 


f  igure  4.7  40  Most  Active  Stocks  for  the  Week  Ended  August  8,  1986 

(New  York  Exchange). 

Another  observation  that  can  be  made  from  this  figure  is  the  absence  of 
statistical  correlation  among  the  variables  when  these  are  compared  in  the  order  shown 
in  Figure  4.7.  In  this  case,  the  sample  serial  rank  correlation  arc  obtained  by- 
comparing  the  adjacent  columns.  Looking  at  the  sample  serial  rank  correlation,  one 
could  conclude  that  there  is  no  statistical  relationship  between,  as  an  example,  the 
volume  of  shares  traded  and  the  price  at  which  the  share  closed  at  the  end  of  the  week. 
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Figure  4.S  40  Most  Active  Stocks  for  the  Week  Ended  Aueust  8,1986 
(New  York  Exchange)  with  Lines  Connecting  the  Two  Higher  Stocks. 

The  only  positive  indication  is  a  relationship  between  percentage  change  during  the  last 
week  (column  4)  and  percentage  change  in  the  last  year  (column  5).  One  can  visually 
confirm  this  lack  of  correlation  by  identifying  the  maximum  and  minimum  values  of 
adjacent  columns.  For  example,  the  Am  Motor  Co.  shows  to  have  the  lowest  volume 
of  shares  traded  during  that  week  but  the  LTV  Corp.  had  the  lowest  close  price. 

Notice  that  one  is  not  only  interested  in  the  stock  which  is  the  most  active  during 
the  week.  One  is  also  interested  in  which  stock  has  the  greatest  (absolute  or  relative) 
change  in  price,  and  whether  this  is  related  to  other  factors  like  earnings  (absolute  or 
relative).  With  this  is  mind,  it  is  possible  to  follow  those  stocks  that  have  the  largest 
value  in  each  of  the  variables  considered  in  the  analysis.  Figure  4.S,  shows  the  two 
stocks  which  have  this  characteristic.  These  stocks  are  joined  by  line  segments.  It  is 
easy  to  sec  that  the  Owens  Corning  Co.  is  the  stock  with  the  highest  volume  of  shares 
traded  during  the  week  and  also  the  largest  price  change  (in  percentage)  during  the  last 
,  twelve  months;  also,  the  IBM  Co.  has  the  highest  closed  price  at  the  end  of  the  week 

'  and  has  the  second  largest  earnings  per  share  (in  dollars)  during  the  last  year.  Likewise, 

i 
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the  practitioner  could  follow  the  trend  of  those  stocks  with  the  lowest  value  in  each  of 
the  variables  considered,  or  even  the  mid-point  values. 

Differences  have  no  meaning  here  but  it  is  interesting  to  trace  the  movement  of 
the  most  active  stocks  (in  volume)  to  other  indicators  (columns). 

D.  AN  ANALYSIS  OF  AUTOMOBILE  DATA 

The  purpose  of  the  following  analysis  is  to  try  to  explore  some  important 
descriptive  characteristics  of  different  types  of  automobiles  and  an  attempt  is  made  to 
find  any  relation  between  these  characteristics.  As  it  is  shown  in  this  analysis,  the 
STARPLOTS,  the  CODED  SCATTER  plots  and  the  CODED  DRAFTSMAN  plots 
techniques  are  paramount  experimental  statistical  tools  in  this  type  of  analysis.  It  is 
appropriate  to  mention  at  this  time  that  one  other  author  has  previously  made  use  of 
the  data  treated  here  and  has  written  an  outstanding  analysis  (See  [Ref.  4]  ).  The 
purpose  here  is  to  demonstrate  how  one  can  convey  to  the  same  general  conclusion 
using  these  new  techniques.  The  new  technique  is  the  enhancement  of  SCATTER  and 
DRAFTSMAN  plots  by  coding  in  other  variables. 

The  data  represent  three  general  categories  of  quantitative  and  qualitative 
characteristics  of  American  and  Foreign  automobiles  of  1979  (the  data  was  obtained 
from  the  Consumer  Report  Review).  These  categories  arc:  performance,  dimension  and 
price.  The  variables  under  these  categories  arc  as  follows. 

In  category  one  :  mileage  in  miles  per  gallon,  repair  records  for  1977  and  1978 
(rated  on  a  5  points  scale;  5  =  best  and  1  =  worst),  turning  diameter  (clearance  required 
to  make  a  U-turn)  in  feet,  gear  ratio  for  high  gear. 

In  the  second  category  :  headroom  in  inches,  weight  in  pounds,  length  in  inches, 
displacement  in  cubic  inches. 

And  under  the  last  category:  price  in  dollars.  This  data  was  initially  formatted 
into  a  two  dimensional  array  consisting  of  74  rows  (name  of  automobiles)  and  13 
columns.  Each  column  corresponds  to  each  one  of  the  variables  mentioned  above,  and 
the  last  column  correspond  to  an  ordinal  variable  to  denote  American  or  Foreign  car. 
This  variable  has  been  added  to  the  original  data  to  demonstrate  one  of  the  many 
possible  application  of  the  CODED  SCATTER  plot  and  of  the  CODED 
DRAFTSMAN  plot  introduced  in  this  thesis;  as  an  example,  one  can  readily  identify  if 
a  certain  deviation  from  a  possible  pattern  is  due  to  American  or  Foreign  cars. 


As  an  initial  starting  point  for  this  analysis,  one  could  study  the  characteristics  of 
each  one  of  the  individual  automobiles.  The  STAR  plot  technique  was  chooscd  for  this 
purpose.  Figure  4.9  shows  the  assignment  of  the  twelve  characteristics  to  the  rays  of 
the  star.  In  the  study  of  this  type  of  data,  it  is  interesting  to  highlight  the  favorable 
characteristics  of  each  automobile.  So  as  in  Chambers  [Ref.  1]  the  larger  the  ray  of  the 
star  the  more  positive  that  attribute  is  to  the  respective  automobile.  To  make  price, 
turning  diameter  and  gear  ratio  favorable,  these  variables  were  multiplied  by  -1  (i.e., 
the  larger  the  ray  corresponding  to  price,  the  less  expensive  the  car  is).  The  star  is 
arranged  in  such  a  way  that  the  statistics  corresponding  to  cost  and  performance 
categories  arc  rising  upward  and  horizontally,  and  those  rays  pointing  downwards 
correspond  to  variables  closely  related  to  the  dimension  of  the  automobiles.  Appendix 
C  shows  the  complete  STAR  Plots  for  the  74  automobiles. 

The  array  of  stars  are  ordered  by  the  weight,  the  first  and  last  stars 
corresponding  to  the  lightest  and  heaviest  automobiles  respectively.  Figure  4.10 
displays  a  summary  of  Appendix  C,  showing  the  10  heaviest  and  10  lightest 
automobiles.  The  idea  behind  this  arrangement  is,  as  commonly  accepted,  that  weight 
is  positively  correlated  to  safety.  Note  the  switch  between  the  first  (Honda  Civic)  and 
last  (Lincoln  Continental).  For  the  first  of  these  all  positive  values  are  above  the  line; 
for  the  latter  this  is  switched.  Note  too  that  the  variable  of  greatest  interest  to 
Consumers  Reports,  Repair  78,  is  the  vertical  ray. 

In  Figure  4.10,  it  is  easy  to  see  that  nine  out  of  the  ten  lighter  cars,  in  the  top 
panel,  arc  of  foreign  make,  the  exception  being  the  Ford  Fiesta.  Also,  that  the  10 
heavier  cars  arc  Americans.  From  the  STAR  plot  of  Figure  4.10,  one  could  also 
compare  other  characteristics  among  these  automobiles.  As  an  example,  in  terms  of 
price  variable  alone,  notice  where  in  this  case,  that  among  the  10  heavier  cars  there  are 
4  American  cars  that  arc  inexpensive  compared  with  most  of  the  lighter  foreign  cars 
(these  American  cars  arc  the  Mercury  Cougar  and  Cougar  XR-7,  Buiek  LIcctra  and,  in 
lesser  way  the  Oldsmobile  98).  Also  in  terms  of  repair  records  (of  1977  and  1978),  those 
American  cars  among  the  heavier  ones  compare  with  those  foreign  cars  among  the 
other  group.  The  information  is  abundant  in  these  plots.  However,  when  there  arc 
many  variables  involved  in  the  analysis  it  is  questionable  whether  the  practitioner  can 
actually  capture  the  behavior  of  one  variable  alone  or  the  joint  behavior  of  two  or 
more  variables.  As  in  this  case  one  would  like  to  sec  if  there  is  any  relation  (linear  or 
other  type)  between  price  and  weight  or,  say,  displacement  and  price  (it  is  difficult  to 
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identity  any  deviation,  if  it  exists,  from  a  possible  relation  in  the  STAR  plot).  In  this 
type  of  situation,  once  the  practitioner  has  an  initial  impression  of  the  data,  it  is  now 
the  time  to  make  use  of  other  exploratory  data  analysis  technique,  such  as  CODED 
SCATTER  plots  and  CODED  DRAFTSMAN  plots. 

To  continue  the  analysis  it  was  desired  to  study  any  possible  relation  among  the 
price,  mileage  per  gallon,  weight  and  displacement  of  the  automobiles  and  to  compare 
how  American  cars  stand  against  the  Foreign  cars.  The  relations  between  these 
variables  are  examined  in  Figure  4.11  by  using  a  CODED  DRAFTSMAN  plot.  It  was 
expected  to  sec  positive  correlation  between  displacement  and  weight.  This  can  easily 
be  seen  in  the  plot  position  2,2  of  Figure  4.1 1.  The  two  possible  outliers  in  plot  2,2  of 
Figure  4.11  show  that  there  are  two  American  cars  that  stand  favorably  among  all 
others.  They  arc  lighter  cars  with  high  displacement.  From  the  figures  in  Appendix  C 
these  two  automobiles  were  identified  as  the  Chevrolet  Chcvctte  and  the  Buick  Opel.  In 
terms  of  price,  it  is  also  possible  to  conclude  from  plot  position  3,3  of  Figure  4.1 1  that 
there  is  a  negative  relation,  as  expected,  between  price  and  weight.  Notice,  in  the  plot 
position  2,1  that  there  seems  to  be  two  types  of  subsamplcs  within  the  data,  one  of 
foreign  cars  and  the  other  of  American  cars  (the  foreign  cars  standing  favorably  against 
the  American  ones);  however,  both  subpopulations  have  the  same  trend,  namely,  that 
weight  increases  with  price.  There  are  a  couple  of  interpretations  of  this  plot,  beside  the 
obvious  dichotomy  between  American  and  Foreign  autos.  One  is  that  if  you  want  a 
heavy'  car,  you  will  have  to  pay  more  if  you  also  want  it  foreign  made. 

An  expansion  of  Figure  4.11  is  given  by  the  CODED  SCATTER  plot,  which 
have  additional  variables  coded  in  as  symbol  type  and  size.  Looking  at  price  versus 
m.p.g.,  in  the  CODED  SCATTER  plot  of  Figure  4.12,  one  can  confirm  the  idea  that 
the  higher  the  price  of  the  automobile  the  less  miles  per  gallon  is  expected.  Notice,  that 
with  this  figure,  it  is  possible  to  analyze  four  variables  at  the  same  time  :  price  and 
miles  per  gallon  being  the  axis  and  weight  and  nationality  the  coded  variables.  It  is 
interesting  to  notice  that  one  American  and  one  Foreign  car  tend  to  deviate  from  tiie 
norm.  The  American  cars  is  the  Cadillac  Seville,  with  very  high  price,  quite  heavy,  but 
a  good  relative  mileage;  the  Foreign,  being  the  V.W.  Rabbit  (Diesel)  is  at  the  opposite 
site  of  the  spectrum.  In  the  middle  of  the  plot  is  a  medium  price,  foreign  car  with  very 
good  mileage.  This  is  a  BMW  320i. 
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Figure  4.12  COOFD  SCATTFR  Plot  of  Automobile  Data 
Price  vs  \1PG  (A  =  American,  F  =  Foreign  and  Size  =  Weight) 


E.  AN  ANALYSIS  OF  CONTRACT  DATA 

The  purpose  of  this  analysis  is  to  demonstrate  other  possible  applications  of  the 
CODED  SCATTER  plot  graphical  technique  as  a  tool  in  the  exploratory  data  analysis. 
It  is  also  appropriate  to  emphasize  that  the  data  considered  in  this  section  has  been 
amply  analyzed  by  other  authors  (  (Ref.  4]  )  and  again  the  purpose  is  only  to  highlight 
the  use  of  this  mentioned  graphical  technique. 

The  data  consisted  of  177  contracts  (rows),  which  were  authorized  by  the 
Department  of  Defense  during  the  period  of  1949  through  1963.  The  columns  consist 
of  1 1  variables  of  possible  interest  to  the  Department  of  Defense  on  how  they  have 
interfaced  on  a  contractual  level  with  the  private  sector  of  manufacturers.  The  data 
represent  contracts  let  with  23  major  contractors  during  this  period,  and  includes 
information  concerning  7  types  of  manufacturer  products,  ranging  in  complexity  from 
drone  aircrafts  to  missiles  and  helicopters.  The  1 1  variables  are  listed  below: 

(1)  Deviation  from  target  cost  (percent). 

(2)  Months  to  comply  a  contract. 

(3)  Target  profit  of  manufacturer  (percent). 

(4)  Sharing  ratio  (percent). 

(5)  Ceiling  price  (percent  of  target  price). 

(6)  Target  cost. 

(7)  Number  of  items  produced  in  the  contract. 

(S)  Number  of  contracts  let  that  year. 

(9)  Year  the  contract  was  signed. 

(10)  Contractor  awarded  the  contract. 

(11)  Type  of  system. 

Due  to  the  diversity  of  the  data  and  the  purpose  of  this  section,  it  was  decided  to 
narrow  the  objective  of  the  analysis  to  a  single  issue,  which  is  probably  the  most 
important  to  the  Department  of  Defense:  an  attempt  will  be  made  to  see  if  there  is  any 
increase  (or  decrease)  in  the  deviation  from  the  manufacturer  target  cost  through  time. 
The  one  deviation  that  is  considered  to  be  the  most  significant  will  be  the  positive  one, 
since  this  phenomenum  would  represent  additional  expenditure  to  the  government. 
Thus,  the  task  is  to  try  to  find  a  possible  cause  to  this  increase. 

Among  the  other  10  factors,  it  was  hypothesized  that  the  year  in  which  the 
contract  was  signed  and  the  time  (in  months)  to  complete  the  contract  had  significant 
influence  on  the  deviation  from  the  manufacturer  original  target  cost.  The  variable  year 
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signed  was  considered  since  the  period  of  study  includes  an  event  that  had  significant 
impact  on  the  US  economy:  the  Korean  War;  therefore,  it  was  expected  that  smaller 
contractors,  not  really  prepared  to  react  to  the  contingency  of  war  production,  would 
be  less  capable  of  making  accurate  predictions.  Figure  4.13  shows  the  display  of  the 
year  the  contract  was  signed  versus  the  deviation  from  target  cost.  It  was  also 
expected  that  the  contractor,  increasing  from  normal  productions,  would  also  be 
affected  in  their  prediction  capabilities.  The  hypothesis  about  the  time  to  complete  the 
contract  is  based  on  a  simple  idea  :  the  wider  the  interval  of  time  for  which  the 
prediction  is  made,  the  less  is  the  probability  of  asserting  the  prediction.  It  was  also 
desired  to  sec  if  the  major  trend  in  this  deviation  of  the  major  contracts,  since  these 
were  probably  of  greatest  interest  to  the  government.  In  Figures  4.13  and  4.14  three 
major  contractors  were  selected  as  been  of  relative  importance:  Lockeed,  Douglas  and 
Grumman.  These  three  are  coded  by  the  initial  letter.  It  is  easy  to  see  that  the  actual 
year  that  the  contract  was  signed  does  not  really  influence  the  deviation  from  target 
cost;  the  deviation  are  evenly  distributed  across  the  period  of  interest.  However,  notice 
that  during  1951  and  1952  (Korean  War  period)  the  deviation  are  mainly  on  the 
negative  side  (probably  the  significance  of  patriotism)  and  thereafter  are  evenly 
distributed. 


Figure  4.13  Year  Signed  vs  Dev.  From  Target  Cost,  Contract 
Data  (L  =  Lockeed,  G  =  Grumman,  D=  Douglas,  o  =  Others). 


The  other  variable  of  interest  was  then  considered,  namely  the  time  to  complete 
the  contract.  The  range  of  this  variable  is  from  around  15  months  to  130  months. 


I 

UCNTHS  ro  COUPLE  TE  (<  40)  VS  OCV.  rpou  TARCET  COST  MOUTHS  TO  COUPLE TC  (40-/0)  VS  OEV.  TROU  T/RCCT  COST  I 


I'ieure  4.14  Months  to  Complete  vs  Dev.  Tareet  Cost,  Contract 
Data  (I,  =  I.ockced,  G  =  Grun\man,  D=  Douglas,  o  =  Others). 

Figure  4.14  shows  the  plot  of  those  contracts  that  took  less  than  40  months,  40  months 
and  less  than  70,  and  70  or  more  months  versus  cost  deviation  respectively.  It  is  clear 
that  there  is  some  form  of  positive  relation  between  cost  deviation  and  those  contracts 
that  took  more  than  70  months  to  be  completed;  confirming  the  initial  hypothesis. 
There  are  some  exception  to  this  conclusion,  and  these  arc  mainly  contracts  that  were 
given  to  two  of  the  largest  contractors,  I.ockced  and  Grumman,  and  possibly  two 
smaller  ones. 

These  plots,  in  a  clear  way,  demonstrate  the  care  that  must  be  taken  in  the 
analysis  of  single  scatter  plots,  one  scatter  plot  portrays  only  isolated  relationship  of 
two  variables  and  may  not  indicate  a  casual  relationship.  One  should  make  use  of 
different  exploratory  data  analysis  techniques  in  an  attempt  to  discover  possible  trends 
in  the  data  be  me  analyzed. 
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APPENDIX  A 

COMPUTER  PROGRAMS 


1.  APLGRAFS  EXEC. 

This  exec  program  present  a  menu  with  all  the  programs  available  in  the  APL 
workspace  APLGRAFS,  after  the  selection  is  made  the  exec  will  load  the  neccsary 
workspaces  and  will  prompt  the  user  to  enter  the  name  of  the  selected  program. 

STRACE 
SET  BLIP  * 

-ONE 

CLRSCRN 

STRACE 

STYPE 

STYPE  YOU  HAVE  THE  FOLLOWING  PROGRAMS  TO  USE 
STYPE 

STYPE  (1)  STAR  AND  PROFILE  PLOTS 

STYPE  <21  BOX  PLOTTED  TABLES 

STYPE  (3)  SYMBOLIC  SCATTER  PLOTS 

STYPE  (41  DRAFTSMAN  DISPLAY 

STYPE  ( 5 1  LOWESS 

STYPE  16)  EXPLANATION  ON  THESE  FUNCTIONS 

STYPE  17)  QUIT 

STYPE 

STYPE  TYPE  THE  NUMBER  CORRESPONDING  TO  THE  PROGRAM  YOU  WANT 

8READ  VAR  SOPT 

8IF  SOPT  =  7  SGOTO  -FINAL 

8IF  SOPT  <  I  SGOTO  -ERROR1 

8IF  SOPT  >  6  SGOTO  -ERROR1 

*  CP  DEFINE  STORAGE  2048K 

*  SSTACK  I  CMS 

CP  TERMINAL  APL  ON 
SSTACK  (LOAD  GRAFSTAT 
SIF  SOPT  =  2  SGOTO  -TWO 
SIF  SOPT  =  3  SGOTO  -THREE 
SIF  SOPT  =  4  SGOTO  -FOUR 
SIF  SOPT  =  5  SGOTO  -FIVE 
SIF  SOPT  =  6  SGOTO  -SIX 

SSTACK  ‘NOW  LOADING  ,  DONT  TOUCH  YOUR  KEYBOARD* 

SSTACK  IPCOPY  APLGRAFS  GSTARPLOT  GOEMO 
SSTACK  IPCOPY  990  CMSIO 
SSTACK  *  * 

8STACK  1  FOR  A  DESCRIPTION  OF  THESE  FUNCTIONS  TYPE  :  * 

SSTACK  *  * 

SSTACK  '  INSTRUCTIONIONS  ' 

SSTACK  *  * 

SSTACK  *  * 

SSTACK  *  TO  EXECUTE  THE  FUNCTION  STARPLOT  TYPE  :  * 

SSTACK  *  * 

SSTACK  *  STARPLOT 
SSTACK  *  * 

SGOTO  -SEVEN 

-TWO  SIF  SOPT  >  2  SGOTO  -THREE 

SSTACK  *NOW  LOADING  ,  DONT  TOUCH  YOUR  KEYBOARD* 

SSTACK  )COPY  APLGRAFS  GBOXPLOT AB  GDEMO 
SSTACK  IPCOPY  990  CMSIO 
SSTACK  *  * 

SSTACK  ’FOR  A  DESCRIPTION  OF  THESE  FUNCTIONS  TYPE  :  * 

SSTACK  *  * 

SSTACK  *  INSTRUCTIONIONS  ’ 
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BOXPLOTAB 


INSTRUCTIONIONS 


TO  EXECUTE  THE  FUNCTION  SCATPLOT 


SCATPLOT 


SSTACK  1  ’ 

{STACK  1  • 

{STACK  *  TO  EXECUTE  THE  FUNCTION  BOXPLOTAB  TYPE  :  ' 

SSTACK  '  * 

SSTACK  1  BOXPLOTAB  ' 

SSTACK  '  • 

SGOTO  -SEVEN 

-THREE  SIF  SOPT  >  3  SGOTO  -FOUR 

SSTACK  'NOW  LOADING  ,  DONT  TOUCH  YOUR  KEYBOARD' 

SSTACK  IPCOPY  APLGRAFS  GSCATPLOT  GDEMO 
SSTACK  IPCOPY  990  CMSIO 
SSTACK  1  ' 

SSTACK  'FOR  A  DESCRIPTION  OF  THESE  FUNCTIONS  TYPE  :  1 
SSTACK  '  * 

SSTACK  '  INSTRUCTIONIONS  * 

SSTACK  *  1 
SSTACK  '  ' 

SSTACK  '  TO  EXECUTE  THE  FUNCTION  SCATPLOT  TYPE  :  * 

SSTACK  '  * 

SSTACK  '  SCATPLOT 
SSTACK  '  * 

SGOTO  -SEVEN 

-FOUR  SIF  SOPT  >  4  SGOTO  -FIVE 

SSTACK  'NOW  LOADING  ,  DONT  TOUCH  YOUR  KEYBOARD' 

SSTACK  IPCOPY  APLGRAFS  GDRAFTSMAN  GDEMO 
SSTACK  IPCOPY  990  CMSIO 
SSTACK  '  ' 

SSTACK  'FOR  A  DESCRIPTION  OF  THESE  FUNCTIONS  TYPE  :  1 
SSTACK  '  ' 

SSTACK  '  INSTRUCTIONIONS  ' 

SSTACK  '  ' 

SSTACK  *  ' 

SSTACK  '  TO  EXECUTE  THE  FUNCTION  DRAFTSMAN  TYPE  :  * 

SSTACK  '  ■ 

SSTACK  *  DRAFTSMAN  ' 

SSTACK  '  ' 

SGOTO  -SEVEN 

-FIVE  SIF  SOPT  >  5  SGOTO  -SIX 

SSTACK  'NOW  LOADING  ,  DONT  TOUCH  YOUR  KEYBOARD' 

SSTACK  IPCOPY  APLGRAFS  GLOWESS  GDEMO 
SSTACK  IPCOPY  990  CMSIO 
SSTACK  '  ' 

SSTACK  'FOR  A  DESCRIPTION  OF  THESE  FUNCTIONS  TYPE  :  * 

SSTACK  '  ' 

SSTACK  '  INSTRUCTIONIONS  ' 

SSTACK  '  ' 

SSTACK  '  ' 

SSTACK.  '  TO  EXECUTE  THE  FUNCTION  LOWESS  TYPE  :  ' 

SSTACK  '  ' 

SSTACK  ’  LOWESS  ' 

SSTACK  *  ' 

SGOTO  -SEVEN 
-SIX  CLRSCRN 
{CONTROL  OFF 
SBEGTYPE  -PAPA 

THIS  WORKSPACE  CONTAINS  PROGRAMS  THAT  MAY  BE  USED  AS  EXPLORATORY 
DATA  ANALYSIS  TOOLS.  PROGRAMS  THAT  ARE  USED  TOGETHER  ARE 
CONTAINED  IN  GROUPS. 

THE  GROUPS  CURRENTLY  AVAILABLE  ARE  GDEMO  .GSCATPLOT .GBOXPLOTAB , 
GSTARPLOT,  GDRAFTSMAN  AND  GLOWESS,  WHERE  THE  G  STANDS  FOR  GROUP. 

IF  YOU  HAVE  COPIED  THE  WHOLE  WORKSPACE  APLGRAFS  YOU  CAN  SEE  A 
LIST  OF  THESE  GROUPS  AT  ANY  TIME  BY  OROPPING  INTO  APL  AND  TYPING  : 


INSTRUCTIONIONS 


TO  EXECUTE  THE  FUNCTION  DRAFTSMAN  TYPE 


DRAFTSMAN 


INSTRUCTIONIONS 


TO  EXECUTE  THE  FUNCTION  LOWESS  TYPE 


LOWESS 


•_*  ■-  «.*  -  -  ■  •  •  O  O  V  •  •  v  -  ■  ' 


IGRPS 


GROUPS  : 


GOEMO .  THIS  GROUP  CONTAIN  SOME  DATA  SETS  TO  BE  USED 

FOR  ILLUSTRATION  BY  THE  PROGRAMS  IN  THIS  WS. 

GSCATPLOT .  THIS  GROUP  CONTAIN  ALL  OF  THE  PROGRAMS  REQUIRED 


TO  PRODUCE  SYMBOLIC  SCATTER  PLOT  OF  TWO  OR  MORE 
DIMENSIONAL  DATA.  A  BASIC  DISCUSSION  OF  THESE 
DISPLAYS  IS  CONTAINED  IN  'GRAPHICAL  METHODS  FOR 
DATA  ANALYSIS'  BY  CHAMBERS  (PAGE  157)  . 

TO  EXECUTE  THIS  PROGRAM  TYPE  : 

SC AT PLOT 

AND  THEN  ANSWER  THE  QUESTIONS. 

YOU  WOULD  NEED  THE  FOLLOWING  TWO  DIMENSIONAL 
ARRAY  : 

-  ARRAY  OF  DATA  (  IN  APL  INSIDE  THE 

WS  OR  OUTSIDE  AS  A  FORTRAN  FILE) 

FOR  A  DEMO  USE  THE  FOLLOWING  ARRAY  : 

DATA - >  CALHOS 

CALHOS  CONSISTS  OF  COST  PER  PATIENT  IN  14  GEO¬ 
GRAPHICAL  DISTRICTS  (ROWS)  OF  CALIFORNIA  OVER  5 
YEARS  (COLUMNS). 

GBOXPLOTAB .  THIS  GROUP  CONTAINS  ALL  OF  THE  PROGRAMS  REQUIRED 

TO  PRODUCE  BOX  PLOTTED  TABLES  (A  COMBINATION  OF 
BOX  PLOTS  AND  A  TABLE  WITH  THE  ORIGINAL  DATA  ON 
THE  SAME  DISPLAY). TO  EXECUTE  THIS  PROGRAM  TYPE  : 

BOXPLOTAB 

AND  THEN  ANSWER  THE  QUESTIONS. 

YOU  WOULD  NEED  THE  FOLLOWING  TWO  DIMENSIONAL 
ARRAYS  : 

-  ARRAY  OF  DATA  (  IN  APL  INSIDE  THE 

WS  OR  OUTSIDE  AS  A  FORTRAN  FILE  ). 

-  ARRAY  OF  NAMES  OF  COLUMNS  (AN  ARRAY  OF 

DIMENSION  tNCOL,20J) 

-  ARRAY  OF  NAMES  OF  ROWS  l AN  ARRAY  OF 

DIMENSION  tNROH,20l) 

IF  YOU  DONT  HAVE  THE  ARRAYS  OF  NAMES  THE  PROGRAM 
WILL  ASK  YOU  TO  ENTER  THE  NAMES  ONE  BY  ONE. 

FOR  A  DEMO  USE  THE  FOLLOWING  ARRAYS  : 

DATA - >  CALHOS 

ROW  NAMES  — >  CALHOSR 
COL  NAMES  - >  CALHOSC 

GST ARP LOT .  THIS  GROUP  CONTAINS  ALL  OF  THE  PROGRAMS  REQUIRED 

TO  PRODUCE  STAR  AND  PROFILE  PLOTS  OF  TWO  OR  MORE 
DIMENSIONAL  DATA.  A  BASIC  DISCUSSION  OF  THESE 
DISPLAYS  IS  CONTAINED  IN  'GRAPHICAL  METHODS  FOR 
DATA  ANALYSIS'  BY  CHAMBERS  (PAGES  158-163) 

TO  EXECUTE  THIS  PROGRAM  TYPE  : 

STARPLOT 

AND  THEN  ANSWER  THE  QUESTIONS. 

YOU  WOULD  NEED  THE  FOLLOWING  TWO  DIMENSIONAL 
ARRAYS  : 

-  ARRAY  OF  DATA  (  IN  APL  INSIDE  THE 
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MS  OR  OUTSIDE  AS  A  FORTRAN  FILE) 

-  ARRAY  OF  NAMES  OF  COLUMNS  ( AX  ARRAY  OF 

DIMENSION  [NC0L.20]) 

-  ARRAY  OF  NAMES  OF  ROWS  IAN  ARRAY  OF 

DIMENSION  [ NROH , 20 ] ) 

IF  YOU  DONT  HAVE  THE  ARRAYS  OF  NAMES  THE  PROGRAM 
WILL  ASK  YOU  TO  ENTER  THE  NAMES  ONE  BY  ONE. 

FOR  A  DEMO  USE  THE  FOLLOWING  ARRAYS  : 


DATA - 

- > 

CARS 

ROW  NAMES 

- > 

CARSR 

COL  NAMES 

- > 

CARSC 

GDRAFTSMAN 


GLOWESS. 


CARS  IS  THE  CAR  REPAIR  DATA  GIVEN  BY  CHAMBERS, 

ET  D. 

THIS  GROUPS  CONTAINS  ALL  OF  THE  PROGRAMS  REQUIRED 

TO  PRODUCE  DRAFTSMAN  DISPLAYS  OF  TWO  OR  THREE 
DIMENSIONAL  DATA.  A  BASIC  DISCUSSION  OF  THESE 
DISPLAYS  IS  CONTAINED  IN  'GRAPHICAL  METHODS  FOR 
DATA  ANALYSIS'  BY  CHAMBERS  I  PAGES  136-140) 

DETAILED  EXPLANATIONS  OF  THESE  PROGRAMS  ARE 
CONTAINED  IN  'DRAFTSMAN  DISPLAY  •,  A  GRAPHICAL 
EXPLORATORY  DATA  ANALYSIS  TECHNIQUE'  AN  NPS  THESIS 
BY  CAPT.  MALCOLM  JOHNSON,  USA. 

THESE  PROGRAMS  ARE  COMPLETELY  INTERACTIVE  AND  CAN 
BE  INITIATED  BY  TYPING  : 

DRAFTSMAN 

AND  THEN  ANSWER  THE  QUESTIONS. 

YOU  WOULD  NEED  THE  FOLLOWING  TWO  DIMENSIONAL 
ARRAYS  ! 

-  ARRAY  OF  DATA  I  IN  APL  INSIDE  THE 

WS  OR  OUTSIDE  AS  A  FORTRAN  FILE) 

-  ARRAY  OF  NAMES  OF  COLUMNS  IAN  ARRAY  OF 

DIMENSION  [NC0L,20J> 

-  ARRAY  OF  NAMES  OF  ROWS  IAN  ARRAY  OF 

DIMENSION  tNROW,201) 

IF  YOU  DONT  HAVE  THE  ARRAYS  OF  NAMES  THE  PROGRAM 
WILL  ASK  YOU  TO  ENTER  THE  NAMES  ONE  BY  ONE. 

FOR  A  DEMO  USE  THE  FOLLOWING  ARRAYS  : 

DATA - >  CARS 

ROW  NAMES  — >  CARSR 
COL  NAMES  — >  CARSC 

THIS  GROUP  CONTAIN  ALL  OF  THE  PROGRAMS  REQUIRED  TO 

USE  THE  ROBUST  LOCALLY  WEIGHTED  REGRESSION  SCATTER 
PLOT  SMOOTHING  TECHNIQUE  DESCRIBED  IN  'GRAPHICAL 
METHODS  FOR  DATA  ANALYSIS'  BY  CHAMBERS  I  PAGE  121). 
DETAILED  EXPLANATION  OF  THESE  PROGRAMS  IS 
PRESENTED  IN  'LOCALLY  WEIGHTED  REGRESSION  AND 
SCATTER  PLOT  SMOOTHING;  A  GRAPHICAL  EXPLORATORY 
DATA  ANALYSIS  TECHNIQUE'  AN  NPS  THESIS  BY 
CDR  GARY  W  MORAN,  USN. 

THESE  PROGRAMS  ARE  COMPLETELY  INTERACTIVE  AND 
CAN  BE  IMPLEMENTED  BY  TYPING  : 


AND  THEN  ANSWER  THE  QUESTIONS. 

YOU  WOULD  NEED  THE  FOLLOWING  TWO  DIMENSIONAL 
ARRAYS  : 

-  ARRAY  OF  DATA  I  IN  APL  INSIDE  THE 
WS  OR  OUTSIDE  AS  A  FORTRAN  FILE ) 


v.v.v.vv  a  • 


-  ARRAY  OF  NAMES  OF  COLUMNS  4  AN  ARRAY  OF 

DIMENSION  t NCOL , ZO  ]  ) 

-  ARRAY  OF  NAMES  OF  ROMS  (AN  ARRAY  OF 

DIMENSION  lNROW,20l> 

IF  YOU  DONT  HAVE  THE  ARRAYS  OF  NAMES  THE  PROGRAM 
WILL  ASK  YOU  TO  ENTER  THE  NAMES  ONE  BY  ONE. 

FOR  A  DEMO  USE  THE  FOLLOWING  ARRAYS  : 

OATA - >  CARS 

ROW  NAMES  - >  CARSR 

COL  NAMES  - >  CARSC 

-PAPA 

SGOTO  -ONE 
-SEVEN  EXEC  APLGST 
SEXIT  100 

-ERR0R1  STYPE  YOUR  VALUE  HAS  TO  BE  BETWEEN  1  AND  6  TRY  AGAIN 
SGOTO  -ONE 
-FINAL  SEXIT  100 


2.  APLGRAFS  VSAPLWS 

The  following  is  a  description  of  the  content  of  the  APL  workspace  APLGRAFS 
VSAPLWS,  which  contains  all  the  functions  needed  to  use  the  programs  described  in 
this  thesis.  This  workspace  contains  several  groups,  each  groups  is  related  to  an 
espccific  program,  and  contains  the  functions  required  to  execute  that  program. 
Following  is  a  list  of  groups  and  functions  inside  those  groups. 


Group  GBOXPLOTAB 


Functions 

BOXPLOTAB 

ADMI 

BOXLINES 

Group 

GDRAFTSMAN 

Functions 

DRAFTSMAN 

DRASYM 

DRAFT 

REPEATCK 

MINMAX 

TRANSFORM 

JJITTER 

SUB 

ADMINS 

LABELS 

REGRES 

REGRES2 

LOWS 

YMAVS 

MOVS 

MMOVAV 

GARY 

GARY  2 

Group 

GLOWESS 

Functions 

REPEATCK 

LOWESS 

REGRES2 

LOWS 

DATAINPUT 

REGRES 

PLOT QUERY 

Group 

Functions 

GSCATPLOT 

MINMAX 

TRANSFORM 

JJITTER 

SCATPLOT 

ADMI 

Group  GSTARPLOT 

Functions  TRANSFORM  STARPLOT  ADMI 


3.  APL  PROGRAMS. 


This  section  contains  the  program  listings  of  the  APL  programs  written  for  this 
thesis,  and  the  modified  version  of  some  existing  APL  programs  taken  from  Johnson 
[Ref.  4]  and  from  Moran  [Ref.  5]. 

a.  BOXPLOTTED  tables  (Program  BOXPLOTAB) 


[ 0 ]  BOXPLOTAB ; DAT AO: DATA j IPL : NROW : NNCOL : NNNCOL ; NCOL : PLO ; 
UIND ; YL ; S 2 0 ; DIF : LH EA D ; YN t 0 IND 1 ; UIND 2:ORD;ORDEN ; SORT ; 
ORD Is  XL ; BAS  1 : COL  2  0 ; XX ; ME A ; VAR ; MED: DATA  2 ; SORE ; TX ; LX ; 
SIZ ;  YY ;  CO  A  ;  DAT  AO  1 ;  ORD  UBASO;  CONT  ;DATA  1 
.1]  ADMI 
[2]  DATAO+DATA 
C  3  ]  IPL<r  0 

[4]  {PRCDx'Y'  )/0 

[5]  NRON+lf  DATA 

[6]  {NROW£50)/LQ1 

[  7  ]  '  THE  MAXIMUM  NUMBER  OF  ROWS  ALLOWED  IS  50,  TRY  AGAIN ' 

C  8  ]  0 

[9]  LQliNNNCOL+NNCOL+~l+  DATAO 

[10]  L02:PLO<-UIND+XL+YL*-S20+20  DIF*-'  ' 

[11] '  ENTER  THE  SCREEN  LABEL  ' 

[12]  LHEAD+ □ 

[13]  'DO  YOU  HAVE  A  {NCOL  20  CHARS)  MATRIX  WITH  THE  NAMES 
OF  COLUMNS  Y/N?' 

[14]  YN+ ltD 

[15]  +{YN*'Y'  )/L 03 

[16] '  ENTER  THE  NAME  OF  THE  MATRIX  ' 

[17]  UIND1+Q 

[18]  UIND  1*-,  {{NNCOL,  5)  '  '  {UIND  1[;  15]  ) 

[19]  ->1,0 1 5 

[20]  L03  : 1-f-0 

[21]  UIND1+-UIND2*- '  ' 

[22]  L04:  j>J+l 

[23] '  ENTER  THE  LAB£L  FOR  COLUMN  NUMBER  '  ,  9  { I  ) 

[24]  UINDl+UINDlA  20* (520 , □  )  ) 

[25]  +{I<NNC0L)/L  04 

[26]  L0 1 5  :  J-«-0 

[27]  '  DO  YOU  HAVE  A  {NROW  15  CHARS)  MATRIX  WITH  THE  NAMES 
OF  ROWS  YJN?  ' 

[28]  YAM-ltD 

[29]  ->(Y1V*'Y'  ) / L0 14 

[30] '  ENTER  THE  NAME  OF  THE  MATRIX  ' 

[31]  UIND2+U 

[32]  UIND2<rUIND2'L\  15] 

[33]  ->7,0 5 5 

[34]  L014 :  J-«-J  +  l 

[3  5]  'ENTER  THE  LABEL  FOR  ROW  NUMBER  '  ,9  (I) 

[36]  UIND2<rUIND2  .  (15+(C1JS20  )  ) 

[37]  +{I<NROW)/L0m 

[38]  UIND2+{NROW ,15)  UIND  2 

[39]  L0  55:ORD+  NROW 


[40] ' DOU  YOU  WANT  THE  DA TA  ORDERED  B Y  THE  FIRST  COLUMN ?  Y/N' 

[41]  •>(  'W'=YO-e-l*a)/Z,056 

[42]  +{'Y'*Y0)/L 055 

[43]  ORD+VDATAOl si] 

[44]  DATAO+DATAO LORD ; ] 

[4  5]  UIND2+UIND210RD;) 

[46]  L056 : 1+1 

[47]  JOR+DATAO 

[48]  JORL ;  l)<-ORD 

[49]  L057 : I+I+l 

[50]  JORl-.Il+yDATAOZill 

[51]  +{I<NNCOL)/L 057 

[52]  L05:+{NNCOL£6)/L06 
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[84] 

[85] 

[86] 

[87] 

[88] 

[89] 

[90] 
-91- 
-92- 

[93] 

[94] 

[95] 

[96] 

[97] 

[98] 

[99] 
[100 
[101 
[102 
[103 
[104 
[105 
[106 
[107 
[108 
[109 
[110 
[111 

[112 

[113 


IPL+IPL+1 

NCOL+6 

NNCOL+NNCOL - 6 
-*B07 

506 : NCOL*-NNCOL 
IPL+IPL+ 1 
NNCOL+-0 

L07  ’.DATA+DATAOl;  (((IPL- 1)  6  )+NCOL)l 
JER*-JOR  [  ;  (((IPL- 1)  <o)+ NCOL )] 

CO  NT*- ,  §  (NROW  ,  (NCOL  )  )  (  (  NCOL)+ 1  ) 

DATA1+,§DATA 

BAS0+'  ftfi509C0NT9DATAl909L9BOX;0  1  .29N9PL09LHEAD9XL 
-YL-  ' 

BAS0+BAS0  ,  '  .16  .20  .92  .8599LIN  1 .8  7 .29LIN91  1  09 
0  1  0  09' 

RUN  BAS 0 

COB2  0^(iVCO5  +  l )  20 
XX*  COL  2  0 
1*1 

RHO*-  (NCOL-1  )  0 
LRHOl:  1*1+1 

TIE*-+/(JORl\ I-ll=JORC; II ) 

-*(TIE>  (NROW  2  ))/LRHO  2 

MO[I-l]*l-(6  (+/  ( (e/OJ?[  ;  J-l]  -t70i?[  j  J]  )*2  )  )  (iW?0// 

( (NROW*2 )-l ) ) ) 

*55503 

LRH02  iNl+NROW  (  (  (iV50r/+l  )  2  )*2  ) 

SBO [ J- 1] *  (  (  (  +/(JOR  [;J-1]*2))  -771  )* 0 . 5  )  (  (  (  +/ (JOS  [  ;  I] 

*2 ) )-/71  )*0 . 5  ) 

BBO[I-l]*(  ^/(JOBT;  J-l]  «70B[;I]  )  )-7Vl  )  RHO  [J-l] 

LRH03  :->(!<  (iVCOB  )  )J  LRHOl 

C05S*(C0520 ,1 )  ((  20+S20,  'MM  COM.  .-'  ),  (20  4  vRHO), 
(20  '  '  ) ) 

MEA*-(COL  20,1)  a  20+S20,'MM7V  .  -  '  )  ,  (20  4  vMEAN  DATA)  ) 
VAR*-(COL20 ,1)  ((  20+520,  'VARIANCE  .-'  ),(20  4  3.M5JMC5 
DATA  )  ) 

MED*-(COL 20.1  )  ((  20  +  S20 , 'MEDIAN  ),  (20  4  *(C0520 
MEDIAN  DATA  )  )  ) 

DATA2+§( CORE, ME A, VAR , MED ) 

UIND*- '  OBSERVATION  ' ,UINDlt((IPL-l) 120)+  (NCOL  20)1 
UIND+(COL 20,1  )  05/VB 
SCSi7*0 ,0.85,0.98,0.9 

rx*i 

BX*0 

SIZ*6 

B71S1-  «  --10-XX-YY - UIND:  ; SJZ - S20 - SCRE--LIN  0  ' 

BAS1-BAS1 ,  '  140  -5177  LX  YXV1  0  0^0  10  09' 

YY*C0520  1 
M/7  BAS1 

2>CM*0  ,0.05,0.98,0.15 

YX-e-4 

1*0 

SJZ*5 

MOO  :  I*-I+l 

1  ( J>4  )/A?001 

]  UIND*- (COL 20,1) DATA2 [ ( 5 - J )  ;  ] 

]  YY*-COL20  I 
]  50/7  MSI 
]  7700 

]  A70  0 1  :  5*0 
]  TX+NROW 
1  S/Z*2 
]  BX*1 

]  SCRE*0  ,0.2,0.98,0.85 
]  M01 : 1*1  +  1 

]  UIND*-  (COB  2  0 , 1  )  (  (UIND2  LI ;  ]  ),(5  0  <f07?0[I]  ), 

(20  2  &DATA [I ; ]  )  ) 

]  YY*-COL 20  (  (NROM+1  )-I  ) 

]  50/7  BAS1 


[114] 

[115] 
C116] 

[117] 

[118] 

[119] 

[120] 
[121] 
[122] 

[123] 

[124] 

[125] 

[126] 

[127] 

[128] 

[129] 

[130] 

[131] 

[132] 

[133] 

[134] 

[135] 

[136] 

[137] 

[138] 

[139] 

[140] 

[141] 

[142] 

[143] 

[144] 

[145] 

[146] 

[147] 

[148] 

[149] 

[150] 

[151] 

[152] 

[153] 


(I<NROW)/M 01 
(YOx'Y1  )/M 02 
PAUSE 

'DO  YOU  WANT  TO  JOIN  WITH  LINES  DATA  POINTS  OF  THE 
SAME  POSITION ' 

( ' y,*l+D)/M02 

BOX :  '  ENTER  THE  POSITION  OF  THE  DATA  POINT  ( ENTER  0 
TO  FINISH )  ' 

(O=0P-fr[]5/MO2 

ZZ+JERLDP ;]  BOXLINES  DATA 
BOX 

MO 2  '.PAUSE 
(NNCOL>0)/L05 
( '  Y ' =ltDJF)/0 

TUMA  :  '  DO  YOU  WANT  TO  SEE  THE  DIFFERENCES  BETWEEN 
COLUMNS  Y/N?  ' 

(  '  Y«*DIF<-l*n)/0 
IPL+0 

NNCOL<-NNNCOL  - 1 
COA+CNNCOL, 1)  NNCOL 

'  DO  YOU  WANT  ABSOLUTE  DIFFERENCES  (A  )  OR  RELATIVE 
DIFF.  (R)' 

{'A'*DIF1+1+U)/TUMAI 

LHEAD+  '  ABSOLUTE  DIFFERENCES  BETWEEN  COLUMNS ' 

'  THE  DIFFER  RELATIVE  TO  THE  FIRST  COLUMN  (F )  OR  THE 
PREVIOUS  (P)  ?' 

(  '  P'  tl+DF+L!}') /TUMA01 

DATAO+  ( DATAOl ;  NNCOL]  - DATAOL ;  (1 +  NNCOL)]  ) 

TUMA  2 

TUMA01:DATA01+§(NNCOL ,NROW)  DATAOL;  1] 

DATAO+]  DATAOl-DATAL;  ( 1  +  NNCOL  )  ] 

CO A+ (NNCOL, 1)  1 
mM2 

rwwi :  LHEAD+ '  RELATIVE  DIFFERENCES  BETWEEN  COLUMNS ' 
'  FPF  DIFFER .  RELATIVE  TO  THE  FIRST  COLUMN  (F)  OR  THE 
PREVIOUS  (P)  ?  ' 

(  'P^ltCD/ZWMll 

MMO+  (  (DATAOL;  NNCOL']  -DATAOL;  (1+  NNCOL  )]  ) 
DATAOL;  NNCOL] )  100 
TUMA  2 

YOTM11  :MM01^(/VWC0£.2VFCW)  DATAOL;!] 

DATAO+  I  ((DAMOl-PAMOC;  (l+MVCOL)]  )  MMOl)  100 
COA+(NNCOL.l)  1 

TUMA 2 : AA+ (NNCOL ,15)  '  PJFF.  PET.  » 

UIND1+,AA,  (2  0  ®(COA)),  ((NNCOL  ,1]  ),(2  0 

9  (  (NNCOL.  1)  (l+MVCOL))) 

UIND2+UIND2LORD;] 

£0  5  5 


b.  STAR  plots  and  PROFILE  plots  (Program  STARPLOT) 


[ 0 ]  STARPLOT ; PRCD;  ANS : SCOL ;  SFCW; WCO£ ;  NUP ;  INC :MAX;  SINT 
;COST ; M ; BAS ;I;MIN; SPA ; TC ; PI ; ONE ;R;C; POSN; XAXIS ; P ; 
BASQ ;BAS1 ;N ;X; XX ; TYP 


[1_ 

[2] 


[3]  ( ( TYPx ' S 1 ) a (TYP* ' P' ) ) /JO 


[4 

[5] 

[6] 


JO  :  '  TYPE  (S  )  FOR  STAR  PLOT  OR  (P)  FOR  PROFILE  PLOT  ' 
TYP+ !*□ 


A  DM  I 
(PRCDx'Y' )/0 
ONE+I+0 


[7]  NCOL+- l+(  DATA ) 

[8]  NROW+1 + ( DATA ) 

[9]  'DO  YOU  HAVE  A  (NROW  20  CHARS  )  MATRIX  WITH  THE  NAMES  OF 
ROWS  Y/N?  ' 

[10]  -»•(  1  Y'*(l  +  D))/«700 

[11] '  ENTER  THE  NAME  OF  THE  MATRIX  OF  NAMES  ' 
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[12]  N<r □ 

[13]  -*J  01 

[14]  JQQ:I+I+V 

[15]  '  ENTER  THE  NAME  FOR  RON  NUMBER  '  ,  9  ( I  ) 

[16]  N+N ,  <20+(C!,  (20  '  '  ))) 

[17]  +(I<NROW)/J 00 

[18]  J01 :1-<-0 

[19]  'CC  (NCOL  2  0  CHARS  )  MATRIX  WITH  THE  NAMES  OF 

COLUMNS  Y/N? ' 

[20]  -*  (  1  Y  '  *  ( 1  +□  )  )/«702 

[21] ’  ENTER  THE  MATRIX  WITH  THE  NAMES  ' 

[22]  NC+O 

[23]  -*-<703 

[24]  c/02  :I*I+1 

[25] '  ENTER  THE  NAME  FOR  COLUMN  NUMBER  '  ,  ( I  ) 

[26]  N+N,  (20+(D, (20  '  '  ))  ) 

[27]  +(I<NCOL)/J 02 

[28]  JO  3:  'DO  YOU  WANT  ALL  COLUMNS  OF  YOUR  MATRIX  OR 

SELECTED  COL.  ALL/SEL? ' 

[29]  ANS+ 1  +  Q 

[30]  +(ANSx'S'  )/K0 1 

[31] '  ENTER  AS  A  VECTOR  THE  SELECTED  COLUMNS  ' 

[3  2]  DATA+DATAL iSCOL+Ol 

[33]  KOI: 'DO  YOU  WANT  ALL  THE  ROWS  OF  YOUR  MATRIX  OR 

SELECTED  ROWS  ( ALL/SEL  )  1 

[34]  ANS+l+E 

[35]  -*(ANS~'S'  )/K 02 

[36] '  ENTER  AS  A  VECTOR  THE  SELECTED  ROWS  ' 

[37]  DATA+DATAISROW+Q;! 

[3  8]  K0  2: TRANSFORM 

[39]  NCOL*-  li(  DATA ) 

[40]  NROW* l  +  (  DATA ) 

[41]  CON  1 :  '  ENTER  NUMBER  OF  PLOTS  PER  SCREEN  (  3  4  OR  5  )  ' 

[42]  iVBP+D 

[43]  5CO/V  (  (NUP>2  )A(/Vf/P<6  )  ) 

[44]  '  NUMBEITOF  PLOTS~MUST  BE  3  4  0i?  5  ,  TRY  AGAIN' 

[4  5]  ->CCW1 

[46]  CON'.INC*  0.95  Wt/P 

[47]  ONE+(NCOL, 1 )  I 
[4  8]  MAX+MIN+NCOL  0 

[49]  +(TYP~  'S'  )/L0 

[50]  XX*-(NCOL,  1  )  X+(  (1  WC0L)  (  (  NCOL)-l  )  ) 

[51]  M*  (  (NCOL  ,  1  5  0  )  .XX, ONE, ONE, XX.  ((NCOL. I')  0) 

[52]  MM  ((2  NCOL),  3  )  M),  [1]  (<2,3  )  (l,X[/VCdL]  ,0,1, 0,0)) 

[53]  -+LL0 

[54]  LO'.SINT*-  lo(o2  ((  NC0L}-1')')  NCOL 

[55]  COST*2o  ( o2  (  (  NCOL)- 1  )  )  PCOL 

[56]  M*ONE,  ((NCOL.l)  (((0.8  COSD  +  l)  2)),  ((M70L,1) 


[53]  -*LL0 

[54]  LO'.SINT*-  lo(o2  ((  NQ0L}-1')')  NCOL 

[55]  COST*2o  ( o2  ((  NCOL) -1  ))  NCOL 

[56]  M*ONE,((NCOL,  1)  (((0.8  COSD  +  l)  2  )  )  ,  (  {NCOL ,  1  ) 

(((0.8  SI/VD  +  l)  2)) 

[57]  AM  (  (2  NCOL)  ,3)  (M,(  (NCOL,  3  )  ( 1 , 0 . 5 , 0 . 5  )  )  )  ) 

[58]  LL0:BAS«-'  0412^93$.  0  .  0  RPyRPWFFVlQLINVLINWFFV 1 

[59]  PW\7  BAS 
[6  0]  A/ZC^J^O 

[61]  ->(TYP=  '  P '  )  / LLO 1 

[62]  M(WC0L,1)  ((CSSP+l)  2)),  ((NCOL,l)  ((SINT+1)  2)) 

[63]  -+L00 

[6  4]  LL01  :M<-(  (NCOL.l  )  (((INCOL)  ((  NCOL  )-l  ))  )-0 . 0  2  )  , 

( (NCOL , 1 ) 0.5) 

[65]  A/VC+90 

[66]  LOO  :I*I+1 

[67]  NAM*-NCU ;] 

[68]  POSN*  (&  (Mil ;  11  ,M  [  J ;  2  ]  )).  '  PP  ' 

[69]  BAS«- '  on2V/VAM90¥C9AWG¥6WOWO¥'  ,POSN ,  '  V.'  FFV ' 

[7 0]  RUN  BAS 

[71]  mNZn+DATAlli&DATAl;  I]  ;  I] 

[7  2]  MAX[I]^DArA[(  1  )  +&DA27I  [  ; I]  ; I] 

[73]  *(I<NCOL)/L 00 

[74]  PAt/SP 

[75]  SPA*-'  ' 


4  - 


[76]  1*0 

[77]  TC*-NUP 

[7  8]  LOOP^'.TC*TC+NUP 

[  79  PI* 0 .05,  (1-  (liVC-  ( INC  6  )  )  )  ,  (  0 . 0  5  +  ( INC-  ( INC  6  )  )  )  ,  1 

[80]  R* 0 

[81]  LOOP2:R*R+l 


[82] 

[83] 

[84] 

[85] 

[86] 

[87] 

[88] 

[89] 

[90] 

[91] 

[92] 

[93] 

[94] 

[95] 

[96] 

[97] 

[98] 

[99] 
[100 
[101 
[102 
[103 


C*0 

L00P1:C*C+1 

1*1+1 

POSN*PI+  <  (INC ,  (  -INC  )  ,  INC ,  (.-INC))  (  (C-l)  ,  (R-l)  ,  (C-l)  , 
(R- 1))) 

XAXIS*NU;1 

P*(DATA [I:  ] -MIN)  (MAX-MIN) 

*(TYP- 1 S 1 )/LO 00 

M*  (1,0,0), [1] ( ONE , XX ,P),[1]((3,3)  (1 ,XINC0L1 ,0,1,0,0)) 
*MO  0  0 

LOOO:M*ONE,((NCOL,1)  (  (  (P  COST)+l )  2  ) ) ,  (  (NCOL ,  1 ) 

(  (  (P  SINT  )  + 1  )  2  )  ) 

M*M, [1] (((2  NCOL). 3)  (M, ((NCOL, 3)  ( 1, 0 . 5 , 0 . 5 ) ) ) ) 
MOOOlBASO*'  nnl2Wfol9.0  .0  RPWRP90FF9P0SN9LIN9LIN90FFV  ' 
BA51-*-'  fl^^XAXIS^OVC^OVeVYPSVWO’?.  5  "0.07  RPVONV' 

RUN  BASO 

RUN  BAS  1 

* (I^NROW ) /ENDO 

* ( (TC+C )>NROW ) /END 

*((C<NUP)a((TC+C)<NROW))/LOOP1 

]  (R<NVP ) / LOOP2 

]  END~. PAUSE 

]  ( (TC+C )<NROW ) /LOOP3 

]  ENDO :PAUSE 


c.  CODED  SCATTER  plots  (Program  SCATPLOT) 


[  0  ]  SCATPLOT :  QUE 1 !  CX ;  CT ;  I ;  DAM  1:  LHEAD :  LPLOT ;  WCOLj  LABX 
\LABY ;  EXX ; EXY; FOSN: POST; EXPRE ;SYtt;  COL ;  SYZ ; DElsCRI ; 
POSLEG i A liX}Y; SPA ;X;Y; PLOT 1 ; PLOIO ; PLOTLEG 
POSI*0 
AD  MI 

DATA1*DATA 
+END  ( PRCD * ' Y ' ) 

♦OWE  (DIM-2) 

'  IO£/P  DATA  IS  WOI  A  TWO  DIMENSIONAL  ARRAY  ,  SCATPLOT  BEING 
TERMINATED ' 

[  7  ]  '  PLEASE  REFORMAT  YOUR  DATA  AND  START  AGAIN ' 

[8]  +END 

[9]  ONE : DATA*DATA 1 

[10]  SPA*'  ' 

NCOL*~l+  (  DATA  ) 

(POSI*0)/ONE00 
'  ENTER  THE  SCREEN  HEADER  ' 

LHEAD*\3 

ONE 00  :  '  ENTER  THE  PLOT  HEADER  ' 

LPL0T*\n 

'  ENTER  THE  VARIABLE  (COLUMN  )  FOR  THE  X  AXIS  ' 
X*DATAL-,CX*(n 

’  ENTER  THE  LABEL  FOR  THE  X  AXIS  ’ 

LABX*0 

'  DO  YOU  WANT  ALL  THE  VALUES  OF  X  OR  JUST  A  SUB  SAMPLE 
OF  IT  (ALL/SUB)' 

QUE1* l+D 
TWO  (QUEl-'A') 

'  ENTER  AN  APL  EXPRESSION  WITH  THE  RANGE  OF  VALUES  FOR  X  ' 


[1] 

[2] 

[3] 

[4] 

[5] 

[6] 


[11] 

[12] 

[13] 

[14] 

[15] 

[16] 

[17] 

[18] 
[19; 
[20] 
[21] 


[22] 

[23' 

[24; 

[25] 

[26] 

[27] 

[28] 


'E.G.  (DATAZi '  , (9 CX), ' ] 2500  ) a (DATA  l ; ' , (*CX  )  , ' ] 510 00  )  ' 
EXX+0 

DATA*EXX/DATA 

TWO  :  '  ENTER  THE  VARIABLE  (COLUMN  )  FOR  THE  Y  AXIS  ' 


^.VJV.  *•; y. 7* ■>.  j»j  ■>,  '.■,  y ■■■  ■ ■.■' <*>'*< '■».' ’J.VJ'J" ’■«.’»  V». T TT »:"^: 77*7 t? y '-t-t.-i 


[29] 

[30] 

[31] 

[32] 


[33] 

[34] 
[3  5] 

[36] 

[37] 

[38] 

[39] 

[40] 

[41] 

[42] 

[43] 
"44" 

[45; 

[46] 

[47] 

[48] 

[49] 

[50] 

[51] 

[52] 

[53] 

[54] 

[55] 

[56] 

[57] 

[58] 

[59] 

[60] 


[61] 

[62] 

[63] 

[64] 

[65] 

[66] 

[67] 

[68] 


[69] 

[70] 

[71] 

[72] 

[73] 

[74] 

[75] 

[76] 


[77] 

[78] 

[79] 

[80] 
[81] 
[82] 

[83] 

[84] 

[85] 

[86] 

[87] 

[88] 


Y+DATA  [ ;  <7Y-«-[]] 

'  ENTER  THE  LABEL  FOR  THE  Y  AXIS  ' 

LABY+\S 

'  DO  YOU  WANT  ALL  THE  VALUES  OF  Y  OR  JUST  A  SUBSAMPLE 
OF  IT  ( ALL/ SUB )' 

QUEU-l  +  a 

TWO  1  ( QUE1='A ') 

'  ENTER  AN  APL  EXPRESSION  WITH  THE  RANGE  OF  VALUES  FOR  Y  ' 

'  E  ,G .  ( DATAL ; '  , (vCY), ' ]5  500  )a (DATA [ ; ' , ($CY), ' ]£1000  )  ' 
EXY-s-D 

DATA+EXY /DATA 
TWOl : Y+-DATA [ ; CY1 
X+DATAhCXl 
J JITTER 
TRANSFORM 
MINMAX 
1+  0 

'ENTER  THE  POSITION  FOR  THE  PLOT  E.G.l  21  22...  ' 
POSI+POSN+ □ 

LOOP  1  (POSI>  1) 

POSN+  0.10.10.80.8 
LOOP! :  J<-J+l 
□«-30  ($1) 

i  i 

'  ENTER  IN  AN  APL  EXPRESSION  FOR  THIS  CATEGORY ' 

' I.E .  (Dm[;4]S.5)A(MW[;8]  =  5)  ' 

'  USE  DATA  AS  THE  NAME  OF  YOUR  VECTOR  ' 

EXPRE+ □ 

'  ENTER  THE  SYMBOL  ' 

SYM<r\n 

'  ENTER  THE  COLOR  ,  I.E.  BLUE » 

coL+a 

'  ENTER  THE  SIZE  ,  AS  A  NUMBER  BETWEEN  1  ( SMALL  )  AND 
12  (.BIG)  1 

sYz+a 

SYMBOLS+SYM, ' ; ' ,COL , ' ; » ,SY Z 
FOUR  <1=1 ] 

PLOT1+ ' p  p 1 0 VXVYV*EXPRE9 ' , SYMBOLS , ' VSPAVSPA9SPAVSPA ' 
PLOTl+PLOTl,  'VPOSNWm  1  0  090  10  0?’ 

RUN  PL0T1 
FIVE 

FOUR:PLOTO+' ap 109X9 YVzEXPREV ' , SYMBOLS , ' VLPLOTVLHEADV 
LABX ' 

PLOTO+PLOTO  ,  '  VLABYVPOSN’WLIN  LX  TXVLIN  LY  TYVl  1  1 
¥0  1  0  0¥’ 

RUN  PLOTO 

FIVE-.+SIX  (POSI>  1) 

'  ENTER  A  LABEL  (DESCRIPTION)  FOR  THIS  CATEGORY  (MAX  25 
CHARS . ) ' 

DESCRI+ 25 t 
DESCRI*-  ( <5  J 

POSLEG<-0 .8,  (0.75-CJ  5)  100] 

PLOT  LEG*- '  a  a  29DESCRI  j  '  ,COL,  '  90VLV0W3VYESVNO9POSLEG  RS 
¥0/V¥' 

RUN  PLOTLEG 

SIX  :  '  DO  YOU  WANT  ANOTHER  CATEGORY  (YES /NO  )  ' 

QUEl+l+a 

LOOP  1  (0UE1-'Y') 

(POSI>1)/QUE01 

PAUSE 

END 

UEOli'DO  YOU  WANT  ANOTHER  PLOT  ( YES /NO ) ' 

UE  1-t-l  +  D 
QUE1-'Y' ) / ONE 
PAUSE 
END: 


,DESCRI ,SYM,'  *  (XSYZ) 


55 


d.  CODED  DRAFTSMAN  plots  (Program  DRAFTSMAN) 


[ 0 ]  DRAFTSMAN ; NCOL :PI:R;C:Y:TN: T2N ; XAXIS : TAXIS : X : LX: TX ; LY 
TY;ANSiF;ROBiYltXl;YSiMiNUM;PRCDiDIM;YMiXMiUMiSKP 

[1]  ADMINS 

[2]  SPA*SYMBOLS*LPLOT*-LHEAD*XAXIS*YAXIS*'  ' 

[3]  SYMBOLS*-'  o  ' 

[4]  EXP*- 1  A  ' 

[5]  +LP1  (PRCD-'Y '  ) 

[6]  ->-0 

[7]  LP1 :*LP2  (DIM> 3) 

[8]  *(LP2 .LP3 ,LP4)[DJM] 

[9]  LP2  :  '  YOUR  DATA  SET  IS  NOT  A  TWO  OR  THREE  DIMENSIONAL 

ARRAY  ' 

[10] '  DRAFTSMAN  IS  BEING  TERMINATED .  PLEASES  REFORMAT  YOUR 

DATA  AND  ' 

[11] '  REINITIATE  DRAFTSMAN ' 

[12]  0 

[13]  LP4 :  N  DRAFT  DATA 

[14]  0 

[15]  LP3  : 

[16]  NCOL*-  1*(  DATA) 

[17]  J JITTER 

[18]  TRANSFORM 

[19]  GARY 

[20]  'DO  YOU  WANT  A  SYMBOLIC  DRAFTSMAN  (YES /NO)' 

[21]  QUE 1-e-l  +  D 

[22]  CON  1  (QUElx'Y'  ) 

[23]  XX*DATA 

[24]  NCOL*DRASYM  DATA 

[25]  LHEAD*-'  ' 


[26] 

[27] 

[28] 

[29] 

[30] 

[31] 

[32] 

[33] 

[34] 

[35] 

[36] 

[37] 

[38] 

[39] 

[40] 
[4i; 
]42; 
143; 
'44' 
[45; 
[46] 


LPLOT*- '  ' 

'  YOU  HAVE  NOW  '  .  ( vNCOL  ),  '  BASIC  VARIABLES  TO  PLOT ' 

CONI :  '  ENTER  NUMBER  OF  PLOTS  PER  SCREEN  (34  OR  5  )  ' 

NUP*U 

CON  ( (NUP>2 )a (NUP<& ) ) 

'  NUMBER  OF  PLOTS  MUST  BE  3  4  OR  5  ,  TRY  AGAIN ' 

CON  1 

CON :  TR*--NUP 
INC*0 . 9  5~NUP 
LOOP 4 :  TR*TR+NUP 
TC*--NUP 

LOOP 3 : TC*TC+NUP 

WI* 0 . 05  ,  (1-  (INC-  (INC  6  )  )  )  ,  (0 . 05+(INC-  (INC  6  )  )  )  ,  1 
P-e-0 

L00P2 :R*R+1 
OO 

Y+DATAL;  (TR+R)] 

L00P1 :  C-*C+1 
X*DATAL:  (TC+C)] 

( (TR+R ) = (TC+C ))/ SKIP 

POSN*WI + ( (INC , ( -INC ) ,INC , ( -INC ) ) ( (C-l ) , (P-1 ) , (C-l ) , 
(P-1  )  )  ) 

XAXIS*N( (TC+C);) 

YAXIS*Nt  (TR+R  ) :  ] 

( (C= 1 )a ( (R-NUP ) v ( (TR+R ) =NC0L ) ) ) /GRAPH 
XAXIS*'' 

(C= 1  ) /GRAPH 
XAXIS*N [ (TC+C ) ; ] 

YAXIS* ' ' 

(  (P=/V£/P)v  (  (TR+R) -NCOL  )  ) /GRAPH 
XAXIS*Y AXIS*'  ' 


[56]  GRAPH ZMINMAX 

[57]  (ANSx'Y' ) /FIN 

[58]  (SMT='M' )/M0V 

[59]  X  LOWS Y 


[60]  SMOOTH*' 049X97; YS90  191 9 . VSPAVSPAyXAXISVYAXISVPOSN ' 

[61]  SMOOTH*  SMOOTH  ,  '  H’LIN  LX  TXVLIN  LY  TYV 1  1  1910  11  0  O' 


RUN  SMOOTH 
SKIP 

MOViMMOVS  YUX] 

YM*UM 

MMOVSXl&Xl 

XM*UM 

SM00TH1*'  p49XsXM9Y;YM90  1919 .  VSPAVSPA’VXAXISVYAXISVPOSN’? ' 
SMOOTH1*SMOOTH1 ,  '  LIN  LX  TXVLIN  LY  TYVl  1  1910  11  0  O' 

RUN  SMOOTH 1 
SKIP 

FIN : BAS* » P  a 1 09X9Y9 ' , EXP ,  '  9 '  , SYMBOLS , ' VLPLOTVLHEADV 
XAXISV ' 

,  '  YAXIS9P0SNVVLIN  LX  TX9LIN  LY  TYVl  1  1 
910 11 0  09' 

RUN  BAS 

SKIP:*((  (TR+R)Z(NCOL))*((TC+C)ZNCOL))/END 

(  (C<NUP)a((TC+C)<NCOL)  )/LOOPl 

( (R<NUP)/\((TR+R)<NC0L))/L00P2 

END:+(ANS='Y' )/SKIP 1 

WI  GARY 2  INC 

SKIP1 : PAUSE 

ERASE 

( ( TC+C)<NC0L)/L00P3 
((TR+R)<NCOL)  /LOOPn 


/LOOP  1 
/L00P2 


e.  Suporting  Sub-programs 


FUN  T I  ON  ADMI 

[0]  ADMT;Qi?l  ;Qfl2 
Cl]  o 

C  2  ]  p  FUNCTION  ADMI  CALLED  BY  FUNCTION  SC  ATP  LOT,  USES 

[  3  ]  o  FUNCTION  CMSREAD  ,  THIS  FUNCTION  IS  A  MODIFIED 
[  4  ]  p  VERSION  OF  THE  FUNCTION  ADMINS  FROM  DTNLFNS  VSAPLWS . 

15]  p 

[6  ]  PRCD*'Y ' 

[  7  ]  'IS  YOUR  DATA  SET  LOCATED  IN  THIS  WORKSPACE?  (YES /NO  )  ' 

[8]  QRl*l+tt 

[9]  LP1  (QRix'Y' ) 


[12] 

[13] 

[14] 

[15] 

[16] 

[17] 

[18] 
[19] 


LP 1 :  ' IS  YOUR  DATA  SET  LOCATED :  ' 

1  (1)  IN  AN  APL  WORKSPACE  LOCATED  ON  THIS  DISK  OR  ON  A  DISK ' 
'  THAT  YOU  ARE  LINDED  TO  ' 

'  ( 2  )  IN  A  CMS  FILE  ON  THIS  DISK  OR  ON  A  DISK  THAT  YOU  ARE ' 

'  LINKED  TO ' 

' ( 3 ) NIETHER  ( 1  ) OR  ( 2 ) ABOVE ' 

' ENTER  ( 1 , 2  OR  3 ) • 

QR2*D 

(LP2.LP3,LPu)[QR2] 


U  Jk  3  J  \  AJJL  f  i-Ji-  O  .  XJi.  -r  j 

[20]  LP 2  :  'TO  TRANSFER  YOUR  DATA  TO  THIS  WORKSPACE:  ' 


[21] 

[22] 

[23] 

[24] 

[25] 

[26] 


(1  )  TYPE  .  .  .  )PCOPY  (WS  NAME )  (DATA  SET  NAME)' 
EXAMPLE:  )PCOPY  DTNLDATA  CARS' 

i 


[24]  '  DATE  AND  TIME  SAVED  INFORMATION  IS  DISPLAYED' 

[25]  '  WHEN  THE  TRANSFER  IS  COMPLETE .  THEN  ENTER  GO  ' 

[26]  '  TO  PROCED  WITH  SCATPLOT' 

[27]  SLADMI*GO 

[28]  GO:'  ENTER  THE  NAME  OF  YOUR  DATA  SET  ' 

[29]  DATA*U 

[30]  DIM*-  DATA 

[31]  END 

[32]  LP 3  :  '  TO  TRANSFER  YOUR  CMS  DATA  FILE  TO  THIS  WORKSPACE ' 

[33] '  ANSWER  THE  FOLLOWING  QUESTIONS  ABOUT  YOUR  DATA  SET ' 

[34]  DATA*CMSREAD 
[3  5]  DIM*  DATA 
[36]  END 


I 


i 


[37]  LP 4  :  '  YOUR  DATA  SET  MUST  BE  STORED  IN  AN  APL  WORKSPACE  OR  ' 
[383  1  IN  A  CMS  FILE  LOCATED  ON  THIS  DISK  OR  ON  A  DISK  TO  WHICH  ' 

[3  9]  'YOU  ARE  LINKED.  SCATPLOT  IS  BEING  TERMINATED  .  PLEASE  ' 

[  40  ]  «  COMPLY  WITH  CONDITIONS  (  1  )  OR  (  2  )  AND  REINITIATE  SCATPLOT ' 

[41]  PRCD+'N' 

[42]  END : 


FUNTION  BOXLINES 

[0]  DAT+JQR  BOXLINES  DATA 

[1]  NCOL+  1+  DATA 

[2]  NROW+1+  DATA 

[3]  MX+§( 1+  ( NCQL-1 )) 

[4]  MX+l  ,W+  ,§(2 ,  MX)  (MX, MX) 

[5]  JM+({  1+fc)  .1  )  (0  1  ) 

[6]  JOR+ .(1~((^InROW-1))  JOR+JORlMXl-1)) 

[7]  MX+((NRQW  (  1+MX)),1)  (  (MX  (2  14))  +  (1+MX)  ((21  140), 

(11 140  ) ) ) 

[8]  JM+((MX)  JM),MX,((  MX)  JOR) 

[9]  BAS2-*-«nfil29JM939.0  .  0  RPVRPVONV  0  .20  .98  .85  VLINVLINVOFFV' 

[10]  RUN  BAS2 

[11]  END: DAT* o 


(1+MX)  ((21  140), 


FUNTION  DRASYM 

[ 0 ]  NCOL+DRASYM MATRIX ;  Cl ;  CV:  I; SYM ;  COL ;  SYZ  :ANS 

[I]  '  ENTER  AS  A  VECTOR  THE  VARIABLES  (COLUMNS)  THAT  YOU 

WHISH  TO  HAVE' 

L 2 ]  'IN  THE  X  AND  Y  AXIS  (THE  FIRST  AND  SECOND  DIMENSION 
FOR  THE  PLOT  )  ' 

C3]  CM 

[4]  N+NKCI):) 

[5]  DATA+MATRIXZiCH 

[6]  NCOL+-  Cl 

[7]  1+  0 

[8]  EXP*-SYM<rCOL<-SYZ+'  ' 

[ 9 ]  ' NEXT,  YOU  HA VE  TO  ENTER  APL  EXPRESSION  FOR  EACH 

CATEGORY  (CODE)' 

[10]  '  USE  XX  AS  THE  NAME  OF  YOUR  ARRAY  ' 

[II]  '  ' 

[12]  '  I.E.  (XXL  ;  J]  >100  )a  (XX[  Jt7]  =  400  )  ' 


[10]  '  USE  XX  AS  THE  NAME  OF  YOUR  ARRAY' 

[11]  '  ' 

[12]  'I.E.  (XXL  ;  J]  >100  )a  (XX[  Jt7]  =  400  )  ' 

[13]  '  ' 

[14] '  WHERE  I  AND  J  REPRESENT  COLUMN  NUMBERS  BETWEEN  1 

AND  '  ,  ( <5  Cl) 

[15]  '  BE  CAREFULLY  NOT  TO  OVERLAP  VALUES  ' 

[16]  '  ' 

[173  '  WHEN  THE  PROGRAM  ASK  FOR  SYMBOLS  TYPE  ANY  (ONE) 
CHARACTER' 

[18] '  FOR  COLORS  TYPE  THE  NAME  OF  THE  COLOR  I.E.  BLUE  OR  RED  ' 

[19] '  WITH  SIZES  1  REPRESENT  SMALL  AND  1 2  BIG ' 

[20]  •  ' 

[21]  L00P1 :1-*-I+l 

[22] '  ENTER  THE  A  PL  EXPRESSION  FOR  THE  CATEGORY  (  CODE  ) 

NUMBER  '  ,(v I ) 

[23]  EXP+EXE »  '  ;  '  □ 

[24] '  ENTER  THE  SYMBOL  ' 

[2  5]  SYM+SYM, □ 

[26]  'ENTER  THE  COLOR' 

[2  7]  COL+COL,  '  '  □ 

[28]  '  ENTER  THE!  SIZE' 

’29]  SY  Z+SYZ  T  1  □ 

[3  0]  '  DO  YOU  WHI^H  ANOTHER  CATEGORY  (YES /NO)' 

[31]  ANS+im 

[3  2]  LOOP  1  (ANS-'Y'  ) 

[33]  EXP+2±EXP 


J.  j.  ri  .  u 

'  ENTER  THE  COLOR ' 
COL+COL. 


Y 


a 


[34]  SYMBOLS*  ( 1 4-SYM  )  ,  '  ;  »  ,  (2+C0L), '  ;  '  ,  (2+SYZ) 


FUNTION  DRAFT 


[  0  IN  DRAFT  DATA  i  NCOL:  TR  \TC  \PI  \R\C ;  Y ;  ZW  \T2N ;  XAXIS :  YAXIS ; 
LXiTX;LY:TYiANS;FiR0B;Yl;Xl;YS:M:NUM;NPAGiVAR;MORE;XUi 
X POSN ;YU 

[  1 J  Finn  DO  NOT  MOVE  OR  ERASE ;  GRAFSTAT  FUNCTION  HEADER 

[2]  ar  n  GRAFSTAT  WILL  NOT  ADD  A  LINE  TO  THIS  FUNCTION 
WITHOUT  THIS  HEADER 

[  3  ]  » THE  THREE  DIMENSIONAL  DRAFTSMAN  DISPLAY  IS  BUILT 
ONE  VARIABLE  AT  A' 

[4]  » TIME.  THE  PROGRAM  WILL  ASK  YOU  WHICH  VARIABLE  YOU 
WANT  TO  LOOK  AT' 

[5]  '  EACH  TIME  IT  IS  READY  FOR  A  NEW  ONE.  THE  DISPLAY 
PRESENTED  FOR  EACH ' 

[6]  '  VARIABLE  REPRESENTS  THAT  VARIABLE  PLOTTED  AGAINST 
ALL  OTHER ' 

[7]  '  VARIABLES  PAGE  BY  PAGE.  THAT  IS  ,  THE  FIRST  ROW 
REPRESENTS  THE  FIRST ' 

[8]  '  PAGE  OF  DATA  ,  THE  SECOND  ROW  REPRESENTS  THE  SECOND 
PAGE  AND  SO  ON ' 

[9]  DATA*M 

[10]  SPA*'  ' 

[11]  NCOL* 2  +  (  DATA  ) 

[12]  NPAG*  1M  DATA  ) 

[13]  LOOP'S'.  'WHAT  VARIABLE  DO  YOU  WANT  TO  LOOK  AT?  ' 

[14]  ((*(iVCOL,l)  NCOL),  [2]  (v(NCOL,l)  '  '  )),[2]  N 


[13]  LOOPS :  'WHAT  VARIABLE  DO  YOU  WANT  TO  LOOK  AT?' 

[14]  ( ($ (NCOL , 1 )  NCOL), 1 2] (v(NCOL, 1)  '  '  )),[2]  N 

[15]  7AM 

[16]  XU*XU+ 0.1  XU*\ /\ /DATAI-,  ;  ( VAR )] 

[17]  (  *N  [  (  VAR  );]),'  WILL  BE  PLOTED  AS  THE  INDEPENDENT  (X 
VARIABLE ) ' 

[18]  '  AND  ALL  OTHERS  WILL  BE  PLOTTED  AS  DEPENDENT  (  Y 
VARIABLES ) . 1 

[19]  J  JITTER 

[20]  TRANSFORM 

[21]  GARY 

[22]  CON :  '  ENTER  *  OF  PLOTS  PER  SCREEN  ( 3 , 4  OR  5  )  ' 

[23]  NUP*1+Q 

[24]  ( (NUP<3 )v (NUP>5 ) )\CON 


Mm 

:26] 

P 

[27] 

[28] 

L*\ 

£29] 

[f  • 

l30] 

f*m 

[31] 

Vy 

[32] 

[33! 

III 

[34] 

[35] 

[25]  INC* 0.95  NUP 


LOOpn:TR*TR+NUP 
TC *  NUP 


C*0 

X*DATAl(.TR+R);iVARl 


[36]  Y*DATA [ (TR+R ) ; ; (TC+C)) 

[37]  YU*YU+ 0.1  YU*r /[  /DATAL;  ;  ( TC+C)) 

[38]  ((VAR)-(TC+C))/SKIP 


t/P)v( 


(TR+R)-NPAG) ) ) /GRAPH 


[38]  ( (YAP)  =(2X7+0 )/$KIP 

[3  9]  POSN*PI+  (  (INC ,  (  INC), INCA  INC)  )  (  (C-l  )  ,  (P-1  )  ,  (C-l  )  , 
(P-1  )  )  )  , 

[40]  XAXIS*Nl (TC+C ) j  ] 

[41]  YAXIS-HV[(YAP);1 

[42]  (  (C-l)  A(R-NUP)si((TR+R)-NPAG))) /GRAPH 

[43]  XAXIS*' ' 

[44]  (.C-l) /GRAPH 

[45]  XAXIS*Nl (TC+C) ;  ] 

[46]  YAXIS*' ' 

[47]  ( ( R-NUP )v ( (TR+R)-NPAG)) /GRAPH 

[48]  XAXIS*YAXIS* '  ' 

[49]  GRAPH: MI NMAX 

[50]  < ANSx'Y ' ) /FIN 


t  1% 


(SMT='M' ) /MOV 
X  LOWS Y 

SM00TH3+'  P49X9Y;  YS90  191 9 .VSPA9SPA9XAXIS9YAXIS9P0SNV 
LIN  LX  X£/9 ' 

SMOOTH3+SMOOTH3 LIN  LY  YU9 1  1  1910  11  0  O' 

RUN  SMOOTH 3 
SKIP 

MOV'.MMMOVAV  YI&X1 
YM+YMAV 
MMMOVAV  Xl&Xl 
XM+YMAV 

SM00TH13-*-*  r49X;XM9Y;YM90  19l9.9SP/i9SPA9X;iXIS9Yi4X7S9P0SP 
9LIiV  LX  XU 9  ' 

SMOOTH 13+SMOOTH13 ,  '  LIN  LY  Y£/9l  1  1910  11  0  O' 

POP  SMOOTH  13 
SKIP 

FIN:BASIC3<r '  B49X9Y90919 . 9SPA9SP^9XAXJS9Yi9XJS9P0SA79 
LJP  LX  XD9 ' 

BASJC3-e-SASIC3  ,  '  LIP  LY  YD91  1  1910  110  0' 

POP  MSIC3 

. a ((TC+C)ZNCOL))/END 

)/LOOP 1 
) /L00P2 


SKIP'.  +  U  (TR+R)2(NPAG) 

(  (  C<POP  )  a  (  ( IC+C  )<NCOL 
( Ir<NUP)a((TR+R)<NPAG 
END :+(ANS= ' Y ' )/SXIPl 
C4PY2 

SPIP1 :  PA  CASE 
EPASE 

( ( IC+C )<NCOL ) /LOOP 3 
(  (IP+P)<PPjIC)/L00P4 

'  DO  YOO  P4PI  TO  LOOK  AT  ANOTHER  VARIABLE?  ' 

MOPE-s-l+Q 

LOOP 5  {MORE- ' Y •  ) 


FUNTION  GRAPHER 

C  0  ]  GRAPHER :GR 1; GR2 ; OPS ; ^P53 ; 75 ; 71 ; XI  * X ; 7 ; 4PS3 ; PPOD ; 
DIMiN ;PEG 

[  1  ]  p  n  p  DO  POP  M07E  OP  ERASE ;  GRAFSTAT  FUNCTION  HEADER 
L  2  ]  RPR  GRAFSTAT  WILL  NOT  ADD  A  LINE  TO  THIS  FUNCTION 
WITHOUT  THIS  HEADER 

[3]  ADMINS 

[4]  NNN+N 

[5]  -»-LPl  (PPCD=  '  7 '  ) 

[6]  ->0 

[  7  j  LP1 :  -*’LP2  (DIM> 3) 

[  8  ]  -*  ( LP2  .  LP3  ,  LP4  )  [DIM] 

[9]  LP2  :  '  YOUR  DATA  SET  IS  NOT  A  TWO  OR  THREE  DIMENSIONAL 
ARRAY .  ' 

CIO]'  GRAPHER  IS  BEING  TERMINATED .  PLEASE  REFORMAT  YOUR  DATA 
AND ' 

[11] '  REINITIATE  GRAPHER ' 

[12]  0 

[13]  LP4  :  NNN  GRAPHER  3  M 

[14]  0 
15]  LP 3  : 

:i6]  NCOL+~ l+( DATA ) 

[17]  JITTER 

[18]  TRANSFORM 

[19]  PP :'  DO  YOU  WANT  TO  CONTINUE  AND  PLOT?  ( ENTER  Y  OR  N  )  ' 

[20]  GPl-*-[] 

[21]  (CPI*  '  Y  '  )/0 

[22]  » WHAT  MATRIX  POSITION  ARE  YOU  REPRODUCING?  ' 

[23]  GP2+D 

[24] '  WHAT  POSITION  ON  THE  SCREEEN?  ' 

[25]  CP3+D 

[26]  GAPY3 

[27]  SPA+'  ' 


IWtVIU  1 


[28]  (ANS3*'Y ' )/Ll 

[29]  GRFRS*- ' A49X9Y ; YS90  1919. 9SPA9SPA9NNNL (GR2 [2] ) ; ]9 
NNNZ (GR2 [1]  );] » 

[30]  GRFRS+GRFRS , '9GR39LIN91  1 1910  11  0  O' 

[31]  RUN  GRFRS 
C  3  2 1  7?/? 

[33]  GRFR+' PH9D AT Al;  (GR2[2]  )19DATAL;  (GR2LH  )]9.919<>9 

SPA -SPA -' 

[34]  GRFR+GRFR, ' 9NNNZ (GR2 [2] ) ; ] 9NNNL (GR2 [1] ):19GR 39 

LIN -LIN 

91  1  1910  110  0' 

[3  5]  L 1 : RUN  GRFR 
[36]  RR  ~ 


FUNTION  GRAPHER  3 


[0]  NNN  GRAPHER 3  M 
[1]  aaa  DO  NOT  MOVE 


[1]  nap  DO  NOT  MOVE  OR  ERASE:  GRAFSTAT  FUNCTION  HEADER 
[  2  ]  p  a  n  GRAFSTAT  WILL  NOT  ADDA  LINE  TO  THIS  FUNCTION 
WITHOUT  THIS  HEADER 

[3]  DATA+M 

[4]  NCOL*-  1  +  CDATA) 

[5]  JITTER 

[6]  TRANSFORM 

[7]  RR:  'DO  YOU  WANT  TO  CONTINUE  AND  PLOT?  ( ENTER  Y  OR  N  ) ' 

18]  GRl+a 
9]  *(C7?1*  'Y'  )/0 

10]'  WHAT  MATRIX  POSITION  ARE  YOU  REPRODUCING  ?  ' 

11]  GR2+Q 

12]  LIMITS 

13]'  WHAT  POSITION  ON  THE  SCREEEN?  ' 

14]  Ci?3-e-D 

[15]  GARY 3 

[16]  SPA*-" 

[17]  (A2VS3*  '  Y '  )/Ll 

[18]  GRFRS*- '  A49X9Y  ;  YS90  1919 .9SPA9SPA9NNNL  (GR2  [2]  );] 
9NNNKGR2L11  );]  ' 

[19]  GRFRS+GRFRS ,  '  9GR39LIN9LIN91  1  1910  11  0  0' 

[20]  RUN  GRFRS 

[21]  RR 

[22]  LI :  GRFR*-'  PV9DATAZ:  (GR2C21  )J9DATA[;  (GR  2[1]  )]9.91 
9°9SPA9SPA9' 

[23]  GRFR+GRFR,' '  NNNK.GR2Z21  )  ;  ]  9NNNZ  (GR2  [1]  )  ;  19GR39LIN 
9LIN9 1  1  1910  110  0' 

[24]  RUN  GRFR 

[25]  RR 


FUNTION  PLOTQUERY 

[0]  PLOTQUERY 
[1]  '  ' 

[2]  SPA*-'  ' 

3 ]  'DO  YOU  WANT  A  PLOT  OF  YOUR  LOWESS  SMOOTHED  CURVE?  ' 

[  4  ]  •  (  YES  OR  NO) . ENTER  NO  IF  NOT  USING  GRAFSTAT ' 

[5]  PT*- 1  +  Q 

[6]  +END  {PT*'Y'  ) 

[7]  '  INPUT  X  AXIS  LABEL' 

[8]  XAXIS*-\L 

[9]  '  INPUT  Y  AXIS  LABEL' 

[10]  YAXIS+V 

[111  PL  1  ( ROB*'Y '  ) 

[12]  PH  DR*-'  ROBUST  LOWESS  SMOOTHING:  F-  ' 

[13]  RPLT*-'  a 49X1911  ;YS90  1919.*+  V bo®&99SPA9PHDR9XAXIS9 

YAXIS92 19 ' 

[14]  RPLT*rRPLT,  '  LIN9LIN91  1  190  1  0  0' 


[15]  RUN  RPLT 

[16]  VIEW 


[17]  PL 2 

[18]  PL1:PHDR+'  NON -ROBUST  LOWESS  SMOOTHING :  F  =  '*F 

[19]  NRPLT+'  flU^Xl^YljYS^O  191'?.*+  VAo Q&WSPAVPHDRVXAXISV 

YAXISV 21V 

[20]  NRPLT+NRPLT ,  ' LINVLINVl  1190  10  0' 

[21]  RUN  NRPLT 

[22] 

[23]  PL2  :  1  DO  YOU  WANT  A  PLOT  OF  \ RESIDUALS  \  VS  X?  ' 

[24]  '  (YES  OR  NO)' 

[25]  055^-1  +  0 

[26]  END  (QS 5* ) 

[27]  '  DO  YOU  WANT  THIS  PLOT  SMOOTHED  ?  ' 

[28]  '  (YES  OR  NO)1 

[29]  CS6-C-1-M] 

[3  0]  XRESID+'  | RESIDUALS |  1 

[31]  PL 3  ( QS6x’Y '  ) 

[32]  X  L0WS(JRESY ) 

[33]  SPPSPAr* '  fl  fl  1 9X9  (  |  RESY ) :  YSVO 
1V1¥. *+  VAo®byVSPAVSPAVXAXISVXRESID9' 

[34]  SRESPLT+SRESPLT,  '  229LINQLINV1  1  1*90  1  0  O’?' 

[3  5]  RUN  SRESPLT 

[36]  PAUSE 

[37]  END 

[38]  PL3:RESPLT+' nnl9XW(  |  PESY  )*90V19 .  *  +  VAo ®&yWSPAVSPAVXAXISV 

XRESIDV ' 

[39]  RESPLT+RESPLT,  '  229LINVLIN’V1  1190  10  09' 

[40]  RUN  RE SPLT 

[41]  PAUSE 

[42]  END: 


.  *  * 


a 


APPENDIX  B 

SAMPLE  PROGRAM  EXECUTION 


1.  BOXPLOTTED  TABLES 

This  program,  as  mentioned  in  Chapter  III  is  executed  by  typing  BOXPLOTAB, 
and  answering  the  queries  as  follows  : 

BOXPLOTAB 

IS  YOUR  DATA  SET  LOCATED  IN  THIS  WORKSPACE?  (YES /NO ) 

YES 

ENTER  THE  NAME  OF  YOUR  DATA  SET 

□  : 

STOCK 

ENTER  THE  SCREEN  LABEL 

ACTIVE  STOCKS  FOR  THE  WEEK  ENDED  AUG.  8,  1986 

DO  YOU  HAVE  A  (NCOL*  20  CHARS )  MATRIX  WITH  THE  NAMES  OF 
COLUMNS  Y/N? 

YES 

ENTER  THE  NAME  OF  THE  MATRIX 

□  : 

STOCKC 

DO  YOU  HAVE  A  ( NROW *  1 5  CHARS  )  MATRIX  WITH  THE  NAMES  OF 
ROWS  Y/N? 

YES 

ENTER  THE  NAME  OF  THE  MATRIX 

□  : 

STOCKR 

DOU  YOU  WANT  THE  DATA  ORDERED  BY  THE  FIRST  COLUMN  ?  Y/N 
YES 

(AT  THIS  POINT  THE  BOXPLOTTED  TABLES  ARE  DISPLAYED  ON  THE 
SCREEN ) 

ENTER  Q  TO  QUIT 
ENTER  E  TO  ERASE  AND  CONTINUE 
ENTER  C  TO  COPY  AND  CONTINUE 
ENTER  CE  TO  COPY ,  ERASE  AND  CONTINUE 
PRESS  ENTER  ONLY  TO  CONTINUE 
CE 

DO  YOU  WANT  TO  JOIN  WITH  LINES  DATA  POINTS  OF  THE  SAME 
POSITION 


ENTER  THE  POSITION  OF  THE  DATA  POINT  (ENTER  0  TO  FINISH  ) 

□  : 

1 

ENTER  THE  POSITION  OF  THE  DATA  POINT  (ENTER  0  TO  FINISH ) 

□  : 

0 


ENTER  Q  TO  QUIT 

ENTER  E  TO  ERASE  AND  CONTINUE 

ENTER  C  TO  COPY  AND  CONTINUE 

ENTER  CE  TO  COPY .  ERASE  AND  CONTINUE 

PRESS  ENTER  ONL  Y  TO  CONTINUE 

Q 


2.  STARP  PLOTS 

This  program,  as  mentioned  in  Chapter  III  is  executed  by  typing  STARPLOT, 
and  answering  the  queries  as  follows  : 

STARPLOT 

TYPE  (S  )  FOR  STAR  PLOT  OR  (P )  FOR  PROFILE  PLOT 
S 

IS  YOUR  DATA  SET  LOCATED  IN  THIS  WORKSPACE?  (YES /NO  ) 

YES 

ENTER  THE  NAME  OF  YOUR  DATA  SET 
□  : 

AUTOS 

DO  YOU  HAVE  A  (NROW*20  CHARS  )  ARRAY  WITH  NAMES  OF  ROWS  Y/N? 

YES 

ENTER  THE  NAME  OF  THE  MATRIX  OF  NAMES 

□  : 

AUTO SR 

DO  YOU  HA  VE  A  ( NCOL *20  CHARS  )  MATRIX  WITH  THE  NAMES  OF 
COLUMNS  Y/N? 

Y 

ENTER  THE  MATRIX  WITH  THE  NAMES 
□  : 

AUTO SC 

DO  YOU  WANT  ALL  COLUMNS  OF  YOUR  MATRIX  OR  SELECTED  COL  . 

ALL/SEL? 

SEL 

ENTER  AS  A  VECTOR  THE  SELECTED  COLUMNS 

□  : 

x  12 

DO  YOU  WANT  ALL  THE  ROWS  OF  YOUR  MATRIX  OR  SELECTED  ROWS 
(ALL/SEL ) 

ALL 

HOW  MANY  VARIABLES  DO  YOU  WANT  TO  HAVE  TRANSFORMED  ? 

TYPE  0  IF  YOU  WANT  NONE 

□  : 

0 

ENTER  NUMBER  OF  PLOTS  PER  SCREEN  ( 3  4  OR  5  ) 

□  : 

5 

(AT  THIS  POINT  THE  STAR  PLOT  IS  SHOWN  ON  THE  SCREEN ) 


ENTER  Q  TO  QUIT 

ENTER  E  TO  ERASE  AND  CONTINUE 

ENTER  C  TO  COPY  AND  CONTINUE 

ENTER  CE  TO  COPY .  ERASE  AND  CONTINUE 

PRESS  ENTER  ONLY  TO  CONTINUE 

CE 


3.  PROFILE  PLOTS 

This  program,  as  mentioned  in  Chapter  III  is  executed  by  typing  STARPLOT, 
and  answering  the  queries  as  follows  : 

STARPLOT 

TYPE  (S  )  FOR  STAR  PLOT  OR  (P )  FOR  PROFILE  PLOT 
P 

IS  YOUR  DATA  SET  LOCATED  IN  THIS  WORKSPACE?  (YES /NO  ) 

YES 

ENTER  THE  NAME  OF  YOUR  DATA  SET 

□  : 

AUTOS 

DO  YOU  HAVE  A  (NROW*2Q  CHARS  )  ARRAY  WITH  NAMES  OF  ROWS  Y/N? 


ENTER  THE  NAME  OF  THE  MATRIX  OF  NAMES 
0: 

AUTO SR 

DO  YOU  HAVE  A  (NC0L* 2  0  CHARS  )  MATRIX  WITH  THE  NAMES  OF 
COLUMNS  Y/N? 

Y 

ENTER  THE  MATRIX  WITH  THE  NAMES 

□  : 

AUTO SC 

DO  YOU  WANT  ALL  COLUMNS  OF  YOUR  MATRIX  OR  SELECTED  COL  . 
ALL/SEL? 

SEL 

ENTER  AS  A  VECTOR  THE  SELECTED  COLUMNS 

□  : 

1 12 

DO  YOU  WANT  ALL  THE  ROWS  OF  YOUR  MATRIX  OR  SELECTED  ROWS 
( ALL/SEL ) 

ALL 

HOW  MANY  VARIABLES  DO  YOU  WANT  TO  HAVE  TRANSFORMED  ? 
TYPE  0  IF  YOU  WANT  NONE 
□  : 

0 

ENTER  NUMBER  OF  PLOTS  PER  SCREEN  ( 3  4  OR  5  ) 

□  : 

5 

(AT  THIS  POINT  THE  STAR  PLOT  IS  SHOWN  ON  THE  SCREEN  ) 


ENTER  Q  TO  QUIT 


ENTER  E  TO  ERASE  AND  CONTINUE 
ENTER  C  TO  COPY  AND  CONTINUE 
ENTER  CE  TO  COPY ,  ERASE  AND  CONTINUE 
PRESS  ENTER  ONLY  TO  CONTINUE 

CE 


4.  CODED  SCATTER  PLOTS 

This  program,  as  mentioned  in  Chapter  III  is  executed  by  typing  SCATPLOT, 
and  answering  the  queries  as  follows  : 


SCATPLOT 

IS  YOUR  DATA  SET  LOCATED  IN  THIS  WORKSPACE?  { YES/NO  ) 

YES 

ENTER  THE  NAME  OF  YOUR  DATA  SET 
□  : 

AUTOS 

FROM  NOW  ON  YOUR  DATA  SET  WILL  BE  CALLED  DATA  { IN  THIS  PROGRAM  ) 

ENTER  THE  SCREEN  HEADER 
AUTOMOBILE  DATA  ;  PRICE  VSM.P.G.  CITY 

ENTER  THE  PLOT  HEADER 

USA  =  A  ,  FOREIGN  =  F  AND  WEIGHT  -  SIZE  OF  LETTER 

ENTER  THE  COLUMN  NUMBER  FOR  THE  VARIABLE  ON  THE  X-AXIS 

□  : 

1 

ENTER  THE  LABEL  FOR  THE  X  AXIS 
PRICE 

DO  YOU  WANT  ALL  THE  VALUES  OF  X  OR  JUST  A  SUBSAMPLE  OF  IT  {ALL /SUB  ) 
ALL 

ENTER  THE  COLUMN  NUMBER  FOR  THE  VARIABLE  ON  THE  Y-AXIS 

□  : 

2 

ENTER  THE  LABEL  FOR  THE  Y  AXIS 
M.P.G.  CITY 

DO  YOU  WANT  ALL  THE  VALUES  OF  Y  OR  JUST  A  SUBSAMPLE  OF  IT  {ALL /SUB  ) 
ALL 

HOW  MANY  VARIABLES  DO  YOU  DESIRE  JITTERED? 

TYPE  0  IF  YOU  WANT  NONE 
□  : 

0 

HOW  MANY  VARIABLES  DO  YOU  WANT  TO  HAVE  TRANSFORMED  ? 

TYPE  0  IF  YOU  WANT  NONE 

Q: 

0 

ENTER  THE  POSITION  FOR  THE  PLOT  E.G.  1  21  22... 

□  : 

1 

llllllllllllllllllllllllllllll 

ENTER  IN  AN  APL  EXPRESSION  FOR  THIS  CATEGORY 


I.E.  (Dm[s4]s.5)A(MH[;8]:5) 

USE  DATA  AS  THE  NAME  OF  YOUR  VECTOR 
(DATA C ; 13 ] =1 ) a (DATA C : 8 ] <2 500 ) 

ENTER  THE  SYMBOL  (ANY  LETTER  , NUMBER  OR  SPECIAL  CHARACTER  ) 

A 

ENTER  THE  COLOR  (WHITE , GREEN ,BLUE , TURQUOISE , RED , YELLOW  OR  PINK 
BLUE 

ENTER  THE  SIZE  ,  AS  A  NUMBER  BETWEEN  1  (SMALL  )  AND  12  (BIG  ) 

3 

ENTER  A  LABEL  ( DESCRIPTION )  FOR  THIS  CATEGORY  (MAX  2  5  CHARS.  ) 

USA  <,  2  500  LB. 

DO  YOU  WANT  ANOTHER  CATEGORY  (YES /NO) 

YES 

222222222222222222222222222222 

ENTER  IN  AN  APL  EXPRESSION  FOR  THIS  CATEGORY 

I.E.  (MW[;4]<.5)a(P/5M[!8]:5) 

USE  DATA  AS  THE  NAME  OF  YOUR  VECTOR 

(DATA [;13]=1)a( (DATA [;8]>2500 ) a (DATA [ ; 8] £3000 ) ) 

ENTER  THE  SYMBOL  (ANY  LETTER  .NUMBER  OR  SPECIAL  CHARACTER  ) 

ENTER  THE  COLOR  (WHITE .  GREEN ,  BLUE ,  TURQUOISE , RED  ,  YELLOW  OR  PINK 
BLUE 

ENTER  THE  SIZE  ,  AS  A  NUMBER  BETWEEN  1  (SMALL  )  AND  12  (BIG  ) 

ENTER  A  LABEL  (DESCRIPTION)  FOR  THIS  CATEGORY  (MAX  2  5  CHARS.  ) 
2500  <  USA  <,  3000  LB. 

DO  YOU  WANT  ANOTHER  CATEGORY  (YES /NO  ) 

YES 

333333333333333333333333333333 

ENTER  IN  AN  APL  EXPRESSION  FOR  THIS  CATEGORY 

I.E.  (Mn[;4]S,5)A(Sffl[;8]:5) 

USE  DATA  AS  THE  NAME  OF  YOUR  VECTOR 
(Z?A2,AC;13]=1)a((ZJA2,AC;8]>3000  )  A  (DATA  C;8]S3500)) 

ENTER  THE  SYMBOL  (ANY  LETTER  .NUMBER  OR  SPECIAL  CHARACTER  ) 

A 

ENTER  THE  COLOR  (WHITE ,  GREEN ,  BLUE ,  TURQUOISE  .RED  ,  YELLOW  OR  PINK 
RED 

ENTER  THE  SIZE  ,  AS  A  NUMBER  BETWEEN  1  (SMALL  )  AND  12  (BIG  ) 

7 

ENTER  A  LABEL  (DESCRIPTION)  FOR  THIS  CATEGORY  (MAX  2  5  CHARS.  ) 
3000  <  USA  <,  3500  LB. 

DO  YOU  WANT  ANOTHER  CATEGORY  (YES /NO  ) 

YES 

444444444444444444444444444444 

ENTER  IN  AN  APL  EXPRESSION  FOR  THIS  CATEGORY 

I.E.  (DATAL;V)Z.5)a(DATAL;8)=5) 

USE  DATA  AS  THE  NAME  OF  YOUR  VECTOR 
<0AZVI[;13]  =  1)a(DAM[j8]>3  5OO  ) 

ENTER  THE  SYMBOL  (ANY  LETTER  .NUMBER  OR  SPECIAL  CHARACTER  ) 

A 

ENTER  THE  COLOR  (WHITE .GREEN .BLUE .TURQUOISE .RED .YELLOW  OR  PINK 


ENTER  THE  SIZE  ,  AS  A  NUMBER  BETWEEN  1  (SMALL  )  AND  12  (BIG ) 

9 

ENTER  A  LABEL  (DESCRIPTION)  FOR  THIS  CATEGORY  (MAX  2  5  CHARS.  ) 

USA  >  3  500  LB. 

DO  YOU  WANT  ANOTHER  CATEGORY  (YES /NO  ) 

YES 

555555555555555555555555555555 

ENTER  IN  AN  APL  EXPRESSION  FOR  THIS  CATEGORY 

I.E.  (DATAliU)Z.5)K(DATAZi8l=5) 

USE  DATA  AS  THE  NAME  OF  YOUR  VECTOR 
(DATA C ; 13] *1 )a (DATA C; 8]^2500 ) 

ENTER  THE  SYMBOL  (ANY  LETTER ,  NUMBER  OR  SPECIAL  CHARACTER  ) 

F 

ENTER  THE  COLOR  (WHITE ,  GREEN ,  BLUE ,  TURQUOISE , RED ,  YELLOW  OR  PINK 


ENTER  THE  SIZE  ,  AS  A  NUMBER  BETWEEN  1  (SMALL  )  AND  12  (BIG  ) 

3 

ENTER  A  LABEL  (DESCRIPTION)  FOR  THIS  CATEGORY  (MAX  2  5  CHARS.  ) 
FOREIGN  <>  2500  LB. 

DO  YOU  WANT  ANOTHER  CATEGORY  (YES /NO  ) 

YES 

666666666666666666666666666666 

ENTER  IN  AN  APL  EXPRESSION  FOR  THIS  CATEGORY 

I.E.  (DAMC;4]5.5)a(MMC;8]  =  5) 

USE  DATA  AS  THE  NAME  OF  YOUR  VECTOR 

(DATAl;13)*l )a ( (DATA [ ; 8) >2500 )a (DATA C;8]£3000)) 

ENTER  THE  SYMBOL  (ANY  LETTER  .NUMBER  OR  SPECIAL  CHARACTER  ) 

F 

ENTER  THE  COLOR  (WHITE  .GREEN  .BLUE  .TURQUOISE  .RED  .YELLOW  OR  PINK 


ENTER  THE  SIZE  ,  AS  A  NUMBER  BETWEEN  1  (SMALL  )  AND  12  (BIG  )  _ 

5 

ENTER  A  LABEL  (DESCRIPTION)  FOR  THIS  CATEGORY  (MAX  2  5  CHARS.  ) 
2500  <  FOREI.  <,  3000  LB. 

DO  YOU  WANT  ANOTHER  CATEGORY  (YES /NO  ) 

YES 

777777777777777777777777777777 

ENTER  IN  AN  APL  EXPRESSION  FOR  THIS  CATEGORY 

I.E.  (MM[;4]S.5)a(MM[;8]=5) 

USE  DATA  AS  THE  NAME  OF  YOUR  VECTOR 

(DATA  [ ; 13] *1  )  a  (  (DATA  C  ;  8]  >3  000 )a  (DATA  [  ;  8]  £3  500  )  ) 

ENTER  THE  SYMBOL  (AN.Y  LETTER  .NUMBER  OR  SPECIAL  CHARACTER  ) 

F 

ENTER  THE  COLOR  (WHITE  .GREEN  .BLUE  .TURQUOISE  .RED  .YELLOW  OR  PINK 


ENTER  THE  SIZE  ,  AS  A  NUMBER  BETWEEN  1  (SMALL  )  AND  12  (BIG  ) 

7 

ENTER  A  LABEL  (DESCRIPTION)  FOR  THIS  CATEGORY  (MAX  2  5  CHARS.  ) 
3000  <  FOREI .  <.  3500  LB. 

DO  YOU  WANT  ANOTHER  CATEGORY  (  YES /NO  ) 

NO 
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5.  CODED  DRAFTSMAN  PLOTS 

This  program,  as  mentioned  in  Chapter  III  is  executed  by  typing 
DRAFTSMAN,  and  answering  the  queries  as  follows  : 


DRAFTSMAN 

IS  YOUR  DATA  SET  LOCATED  IN  THIS  WORKSPACE? 

( YES  OR  NO) 

YES 

ENTER  THE  NAME  OF  YOUR  DATA  SET 

□  : 

AUTOS 

DO  YOU  WANT  ALL  OF  THIS  DATA  OR  JUST  A  SUBSAMPLE  OF  IT  TO 
BE  PRESENTED  IN  THE  DRAFTSMAN  DISPLAY?  ENTER  ( ALL  OR  SUB  ) 

ALL 

DO  YOU  HAVE  A  TWO  DIMENSIONAL  ARRAY  OF  NAMES  FOR  THE  DATA 
WHICH  IS  TO  BE  DISPLAYED  ?  NOTE :  THESE  NAMES  ARE  THE  NAMES 
OF  THE  VARIABLES  REPRSENTED  BY  THE  COLUMNS  OF  YOUR  DATA  SET. 

( YES  OR  NO) 

YES 

WHAT  IS  THE  NAME  OF  YOUR  ARRAY  OF  VARIABLE  NAMES? 

□  : 

AUTO SC 

HOW  MANY  VARIABLES  DO  YOU  DESIRE  JITTERED  ? 

TYPE  0  IF  YOU  WANT  NONE 

□  : 

0 

HOW  MANY  VARIABLES  DO  YOU  WANT  TO  HAVE  TRANSFORMED  ? 

TYPE  0  IF  YOU  WANT  NONE 
U: 

0 

DO  YOU  WANT  TO  DO  WANT  TO  FIT  A  SMOOTHED  CURVE 
ON  ALL  DRAFT AMAN  PLOTS?  .  .  .  (YES  OR  NO  ) 

NO 

DO  YOU  WANT  A  SYMBOLIC  DRAFTSMAN  (  YES /NO  ) 

YES 

ENTER  AS  A  VECTOR  THE  VARIABLES  ( COLUMNS  )  THAT  YOU  WHISH  TO  HA  VE 
IN  THE  X  AND  Y  AXIS  ( THE  FIRST  AND  SECOND  DIMENSION  FOR  THE  PLOT ) 
□  : 

1118  2 

NEXT,  YOU  HAVE  TO  ENTER  APL  EXPRESSION  FOR  EACH  CATEGORY  ( CODE  ) 
USE  XX  AS  THE  NAME  OF  YOUR  ARRAY 
I.E.  (XX  [;J]>100)a  (XX  [;«7]=400) 

WHERE  I  AND  J  REPRESENT  COLUMN  NUMBERS  BETWEEN  1  AND  4 
BE  CAREFULLY  NOT  TO  OVERLAP  VALUES 

WHEN  THE  PROGRAM  ASK  FOR  SYMBOLS  TYPE  ANY  (ONE  )  CHARACTER 
FOR  COLORS  TYPE  THE  NAME  OF  THE  COLOR  I.E.  BLUE  OR  RED 
WITH  SIZES  1  REPRESENT  SMALL  AND  12  BIG 

ENTER  THE  APL  EXPRESSION  FOR  THE  CATEGORY  (CODE)  NUMBER  1 
XXL ; 13] =1 
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ENTER  TEE  SYMBOL 
A 

ENTER  THE  COLOR 
RED 

ENTER  THE  SIZE 
4 

DO  YOU  WHISH  ANOTHER  CATEGORY  ( YES /NO  ) 

YES 

ENTER  THE  APL  EXPRESSION  FOR  THE  CATEGORY  (CODE)  NUMBER  2 
XXL; 

ENTER  THE  SYMBOL 
F 

ENTER  THE  COLOR 
RED 

ENTER  THE  SIZE 
4 

DO  YOU  WHISH  ANOTHER  CATEGORY  (YES /NO  ) 

NO 

YOU  HAVE  NOW  4  BASIC  VARIABLES  TO  PLOT 
ENTER  NUMBER  OF  PLOTS  PER  SCREEN  (  3  4  OR  5  ) 

□  : 

4 

DO  YOU  WANT  TO  FIT  A  SMOOTHED  CURVE 
ON  SELECTED  PLOTS ?  .  .  .  (YES  OR  NO  ) 

NO 


(AT  THIS  POINT  THE  CODED  DRAFTSMAN  PLOT  IS  SHOWN  ON  THE  SCREEN  ) 


ENTER  Q  TO  QUIT 

ENTER  E  TO  ERASE  AND  CONTINUE 

ENTER  C  TO  COPY  AND  CONTINUE 

ENTER  CE  TO  COPY ,  ERASE  AND  CONTINUE 

PRESS  ENTER  ONLY  TO  CONTINUE 

CE 


LIST  OF  REFERENCLS 


L  Chambers,  J.  M.,  and  others,  Graphical  Methods  for  Data  Analysis,  Wadswoth, 


2.  Heidelberger,  P.  and  Lewis  P.A.W.,  Regression- Adjusted  Estimates  for 

Regenerative  Simulation,  with  Graphics,  Communications  of  the  AC.V1,  Volume 
24,  Number  4,  April  19&1. 


3.  Connover,  W.  J.,  Practical  Nonparametric  Statistics,  Second  Edition,  John 
Wiley  and  Sons,  1980. 


4.  Johnson,  Malcolm,  D.,  Jr.,  Drajtsman  Displays,  A  Graphical  Technique  for 

Exploratory  Data  Analysis.  Master's  Thesis,  Naval  Postgraduate  School, 
Monterev,  Calilornia,  June  1984. 


Moran.  Gary,  W.,  Locally _  Weighted  Regression  Scatter  Plot  Smoothin 
(LO ft  ESS):  A  Graphical  'Exploratorv  Data  Analvsis  Technique.  Master' 
lhesis,  Naval  Postgraduate  School,  Monterev,  California,  September  1984. 


6. 


W  R.  Church  Computer  Center,  VS  APL  at  NPS,  Naval  Postgraduate  School, 
July  1982. 


>  W  V.  A  /  A  A.  A  -  .  /.  '  A 


:  V  *  "  */  *.'  «L*k  7-  >  *  -T.  V.  ^V.V  A  \"  V*  "  J*  V»  ">  7«V^7VtV.V.Vl 


INITIAL  DISTRIBUTION  LIST 


Defense  Technical  Information  Center 

Cameron  Station 

Alexandria,  Virginia  23304-6145 

Library,  Code  0142 
Naval  ‘Postgraduate  School 
Monterey,  California  93943-5002 

Prof.  Peter  A.  Lewis 

Naval  Posteraduate  School  (Code  55Lw)‘ 
Operation  Research  Department 
Monterey,  California  93943-5000 

Prof.  Ioenaid  G.  O'Muircheartaigh 
Naval  Posteraduate  School  (Code  550M) 
Operation  Research  Department 
Monterey,  Calilornia  93943-5000 

Mavor  FAP  Juan  M.  Isusi 
Centro  de  Informatica 
Fuerza  Aerea  del  Peru 
Lima,  Peru 


No.  Copies 


vv\-: 


